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ESTIMATION OF MORTALITY INTENSITIES IN 
ANIMAL EXPERIMENTS 


A. W. 


Mathematics Panel and Biology Division, Oak Ridge National Laboratory,? 
Oak Ridge, Tennessee, U. S. A. 


1. Introduction 


Methods for describing mortality distributions and for estimating 
mortality parameters have been developed, for the most part, by 
actuarial mathematicians, with the natural result that applications have 
been in terms of human populations. In the construction of human 
life tables large numbers of observations are usually available, and 
the large-sample methods used by actuaries have proved to be quite 
successful. In recent years, however, following the general trend of 
increased expenditures for research on cancer, heart disease, and other 
major causes of death, statisticians have become interested in more- 
specific types of mortality problems, and in their efforts have been 
aided substantially by notable advances in small sample theory and 
nonparametric methods. 

One problem that has received much attention (Berkson and Gage 
[1952], Fix and Neyman [1951], Harris et al. [1950], Littell [1952]) is 
the estimation of mortality, recovery, and relapse rates for human 
beings suffering from specific diseases. Whether the data are collected 
retrospectively or prospectively, the conditions of collection seldom 
resemble those in a controlled experiment with laboratory animals. 
The collaborating physician must always think first of his patient’s 
health and only secondly of the purpose of the investigation. In 
follow-up studies individuals are invariably lost from observation, die 
from causes other than the one under study, and are admitted to studies 
at different ages. Other authors (Grenander [1956], Kaplan and Meier 
[1958], Seal [1954]) have treated the mortality rate problem somewhat 
more generally, but even in these papers the emphasis is on estimation 
of parameters that are meaningful to human populations. 

In mortality studies on animal populations, emphasis is usually 
placed on different questions; namely, on longevity itself, together with 


iPresent address: Department of Biostatistics, The John Hopkins University, Baltimore, Mary- 
land, U.S.A. 


2Operated by Union Carbide Corporation for the U. 8. Atomic Energy Commission. 
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the broad effects of various outside agents. In addition, animal experi- 
_ ments can be controlled relatively completely, and the statistical prob- 
lems are correspondingly less severe. 

In the field of radiation biology, much effort is being expended to 
help elucidate the principles involved in the induction of injury by 
ionizing radiation and the subsequent repair that occurs if the organism 
survives. Some investigators (Blair [1952], Sacher [1956], Yockey 
[1956]) have even proposed mathematical models for these processes, 
and attempts to fit such models to laboratory data have pointed out 
the need for adequate statistical tools. . 

The purpose of this paper is to discuss nonparametric methods for 
estimating mortality intensities (defined in the next section) from small 
samples in controlled experiments uncomplicated by losses and age 
variations, and to compare them with respect to their usefulness in 
interpreting radiation mortality data. Although some of these tech- 
niques have been published, their use by experimenters seems to be 
limited, perhaps because of the places of publication or perhaps because 
actuarial notation is unfamiliar to those of us accustomed to the notation 
used in modern distribution theory. In the following sections, the 
notation is necessarily involved but it should be unambiguous. Readers 
interested in estimates of parameters other than the mortality intensity 
should consult the references, particularly the papers by Seal [1954], 
much of whose notation has been adopted, and Grenander [1956]. 


2. Notation and Definitions 


Because observations on a mortality distribution are necessarily 
ordered, it is possible to define certain quantities that have specific 
interpretations related to the ordering. Let 


M(zx) = probability that an individual born into a population at 
time zero will die on or before age z(0 < 4 < &). 


Thus A(z) is the distribution function for the random variable, age 
at death. In actuarial use, 1 — M(x) = l, , except that this 7, has 
been standardized so that 1, = 1. Next let 


= [1 — +7))/[1 — 
= probability that an individual alive at time z will also (2.1) 
be alive at time x + T. . 
This is also recognized as the conditional probability of survival to 
age x + T, given survival to age z. In the actuarial literature the 
notation ,.79, is sometimes used for 1 — x(z, z + T). Finally, let 


1 dM(z) 


H 
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which has several names. In this paper, u, will be called the mortality 
intensity following Grenander [1956], although it is also known as the 
age-specific death rate, the instantaneous death rate, and the force of 
mortality. Intuitively, u, is simply the mortality rate applicable to 
the fraction of the population surviving at age x. The relation, 


+ T) = exp ar | (2.3) 


which will be useful later, is easily verified. 
Of all the functions that have been used to interpret mortality data, 
the Gompertz-Makeham equation, one form of which is 


M(2) = 1 (2.4) 


is best known. ‘This function has the property that log yu, is a linear 
function of z, and when estimates of u, are plotted semilogarithmically, 
the resulting curve is often called a Gompertz plot. Plots of actual 
mortality data seldom behave linearly over the entire life span but 
in many cases, (2.4) fits well for values of x past the median age at 
death. Gompertz plots have figured prominently in the literature on 
radiation mortality (Furth et al. [1959], Grahn and Sacher [1958], 
Jones [1955, 1957], Sacher [1956]). 

Consider a random sample of N, individuals from a population 
with an unknown distribution function M(x) for the age at death. 
Each individual in the sample is observed from birth to death and the 
ages at death are recorded. For use in estimation it is convenient 
to divide the observation period into k adjacent intervals and to use 
the following notation: 


y; = number of observed deaths in 7th interval (¢ = 1, --- , k), 
XZ; , X44, = boundaries of 7th interval, 

T,; = 24, — 2; = width of 7th interval, 

N, = number of individuals alive at 2, , 

x; + ¢,; = age at death of jth individual in 7th interval 


50 < ti < 5 tio = 0). 


3. Estimates Based on Preassigned Time Intervals 


Two basically different sampling procedures or ‘‘stop rules’ have 
been proposed for the estimation of mortality parameters. Under the 
first procedure, the time intervals 7’, are chosen arbitrarily, often of 
equal width, and each y, associated with an interval is a random variable. 
In fact, the y; jointly have a k-cell multinomial distribution, with cell 
probabilities given by M(x;,,) — M(zx,;). Under the second procedure, 
the y, are preassigned, or fixed in advance of the experiment, and the 
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x; or T, become the random variables. The first procedure will be 
considered in this section. 
The classical estimate of u, in actuarial use is 


(y:/T)/31N; + (3.1) 


ie., the number of deaths per unit time in the interval divided by the 
average number of survivors during the interval. This form is used 
when the ages at death within the interval are not known. If they 
are known, a somewhat better estimate is obtained by dividing the 
number of deaths by the total number of years of life represented over 
the interval. In fact, Grenander [1956] by an ingenious argument has 
shown that this is the maximum likelihood estimate of u, when the 
mortality intensity is assumed to be nondecreasing. Perhaps it should 
be noted that 


= [dM(zx)/dz]/{1 — M(z)) 


a 
=avg. {1 — M(z)] Ar 3[2 — M(x + 7/2) — M(x — T/2)| 
and that (3.1) is related to this approximation. 
The second estimate of u, based on preassigned time intervals has 


been proposed by Sacher [1956], who apparently bases his estimate 
on the fact that 


= log [1 — M(z)] = log [1 — M(z)] 


= —flog [1 — M@ + — log (1 — — 37)}/T 


= p logy — M(x + 3T) 


This leads to the estimate 
(1/T,) log (N./Nis1), (3.2) 


which intuitively one might prefer to (3.1) since it arises from a delta 
approximation of the entire function rather than just a part of it. 
Although the estimates discussed in the next section are probably 
superior to (3 1) and (3 2), there may be occasions in which the latter 
will be used, for example, when the ages at death within intervals are 
not known. For this reason a small numerical study of the biases 
in (3.1) and (3.2) was performed and will be described briefly. The 
joint frequency function of the y; is known to be the multinomial, 
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k 
TI wt) 
where p, = — M(z,). Dropping the 7 subscript and letting 
Nx, = N, the estimates (3.1) and (3.2) may be written 


= Qy/T(2N + y), = (1/7) log + 9)/N], 
and it is easy to show that the joint frequency function of y and N is 
[Ni!/y! NI (Ni — y — — p — (3.3) 


where p, = pand @= >-'_,,, p;. It is clear that both estimates have 
expectations that do not exist if the random variables are allowed to 
vary over their entire ranges, and it follows that (3.3) should be trun- 
cated. Fortunately, for the sample sizes used in the numerical study 
(50 < N, < 500), the effect of truncation is negligible and may be 
ignored, By expanding »™ in series, it may be shown that E(u™)> 
H(u“’), but, since the bias in u“” is often negative, this fact is useful 
only for checking numerical results. 

The biases in »“” and uw were evaluated by expanding the estimates 
in Taylor’s series about E(y) and E(N) up to terms of fourth degree 
in y and N plus the remainder. The expected values of the series 
were then computed, and an attempt was made to find upper bounds 
for the contribution of the remainder term. Although rigorous bounds 
could not be found, the successive terms of the series as evaluated left 
little doubt as to the adequacy of four terms for most of the arguments 
employed. Central product-moments of y and N up to the fourth 
order were found by the method described by Rao [1952]. 

To provide a basis for the numerical work, a Gompertz-Makeham 
mortality function was fitted to the second half of the mortality data 
for a group of male mice exposed to 240r of gamma radiation (Furth 
el al. [1959]) and observed until death. The estimates from this fit 
were then used as parameters in the numerical study of bias. Biases 
were computed for all combinations of N, and 7' for N, = 50, 75, 100, 
150, 200, 300, 500 and 7 = 10, 15, 20, 30 (weeks), and for ages at 
death from a point just before the median age at death (126 weeks) 
to a point corresponding to about 95 percent mortality. 

From the results it was clear that the bias in both estimates is a 
function of all three quantities, N, , 7, and x. For « in the vicinity 
of (x), the biases were small, even for N, = 50, varying from about 
1 percent for T = 10 to 5 percent for T = 30, but for large x, biases 
reached and even exceeded 10 percent. As N, was increased, the bias 
decreased proportionately more for small 7’ than for large, particularly 
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in the case of 1. Whereas x was almost always positively biased, 
x” tended toward negative bias as z was increased. The bias in both 
estimates varied with 7; in » it increased monotonically, and in p“ 
it sometimes increased monotonically and at other times began positive 
and decreased through zero to negative values. Strictly speaking, 
these results apply only to the mortality function used in the com- 
putations; nevertheless, the parameters are typical of those encountered 
in radiation mortality experiments with mice, and it seems safe to 
conclude that either estimate is capable of leading to a distorted picture 
of the Gompertz plot with neither estimate obviously preferable to 
the other. (This point will be discussed further in Section 5.) In 
large samples, one might argue that, since 1” is essentially the maximum 
likelihood estimate, it would have the most desirable large-sample 
properties, but in many animal experiments, the samples are not large 
enough to justify this argument. 


4. Estimates Based on Preassigned Numbers of Deaths 


The “stop rule” for this estimation procedure, the original proposal 
for which has been attributed to Moran [1951], is based on the numbers 
of deaths observed rather than on the attainment of predetermined 
time limits. Specifically, if 100 animals are available for an experiment 
and it is desired to estimate yu, at five points, five time intervals would 
be chosen corresponding possibly to the deaths of the first 20, second 20, 
etc., animals. Whether or not the numbers of deaths per interv: : 
are chosen equal, they are fixed in advance and the boundaries of the 
intervals are random variables. Based on this sampling procedure, 
Seal [1954] has given an unbiased estimate of u, , the derivation of 
which will be reproduced here in slightly more detail since later a 
modification will be proposed. The notation used is explained in 
Section 2. 

Let &;.;+, be the random variable corresponding to the time of death 
of the (j + 1)st individual in the 7th interval, and let ¢’ and ¢” (0 < 1’, 
t'’ < T,) be arbitrary numbers. Then from (2.3), 


Prob > | > = exp | [ Mayer 
whereupon 
Prob {5.541 < | > UY 


=1- exp | Mayer ar. (4.1) 
If t’ = t,; , (4.1) is immediately recognized as an exponential distribution 


| 
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function for the random variable 
Us; = (N, / dr, 0 < U3; 
tig 


with the corresponding frequency function, g(u;;) = exp [—w,;], from 
which it is easy to show that the sample value 2u(t;,;.,) has a x’- 
distribution with 2 degrees of freedom. Thus 


20, = - 9) (4.2) 


is distributed as x’? with 2y; degrees of freedom. If now 7; is small 
enough so that one may assume 


= Mi s 0 T < (4.3) 
(4.2) becomes 


wi-t 


Because (4.4) is a parameter (u;) multiplied by a random variable and 
is distributed as x’, Seal makes use of a general result to show that, if 


a = (yi = (N; | (4.5) 


= , ie., (4.5) is an unbiased estimate of »,; . - Theorems by 
Lehmann and Scheffé [1950] may be used to show that (4.5) is also 
efficient and complete. The variance of u{ is shown to be approximately 
/ (y; — 2). 

Assumption (4.3), although suitable for some purposes, is not con- 
sistent with most data on mortality intensities which indicate that y, 
is usually a nondecreasing function of x, particularly as x approaches 
and passes E(x). Seal recognizes this and in proceeding to oktain 
estimates of x(x, x + 7), which is the main objective of his paper, 
he assumes a uniform distribution of deaths over the interval (0, 7’). 
Unfortunately this assumption does not lead to a simple estimate of 
u, , and some alternative is indicated. 

If it is assumed that u,,,,- increases linearly with 7 over the interval 
for which y, is being estimated, then 


Essentially, this amounts to linear interpolation over the range of 1, 
and should be reasonably accurate particularly for data obeying the 
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Gompertz-Makeham law. Under this assumption 


be + bis ( + 
x | + 1 QT | (4.7) 


If 


ll 


By = (noting that = Me, 
then (4.2) becomes 
Q, = + ss ’ i= is k, 


Since 2Q, is distributed as x’ with 2y; degrees of freedom, the frequency 
function for Q, is . 


whereupon the iogarithm of the likelihood may be written 


k 
L = (constant) + >> [(y, — 1) log Q; — Q,). 
Differentiating with respect to , 
2 —1 ) 20. 
Ona Q; 
— 1)/Q) — Ar, for a =1, 


4 [Aa(Ya 1)/Q.] + 1)/Qa-1] (A, + 
for 2<a<hk. 


1)/Q,) = B, for Qa k 


If these partial derivatives are set equal to zero, the resulting equations, 
viewed as functions of the unknowns, 


6. = (ya — 1)/Q. ; l<a<k, 


are linear and have the unique solution @, = ! for all a Thus the 
maximum likelihood equations are equivalent to the equations, 
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Qa = Ya — 1 (4.9) 
which are linear in the , 1 <a < k +1. 


Since this set of k equations contains k + 1 unknowns, the system 
is underdetermined, but in most cases the difficulty may be met satis- 
factorily. If the first two intervals are chosen to be located somewhat 
before the median: age at death, assumption (4.3) will usually be more 
reasonable and (4.5) can be used to estimate uw, at the mid-points. 
‘These values may then be used to extrapolate logarithmically to the 
lower boundary of the first interval for which assumption (4.6) is to 
be made. Specifically let my and mé be the estimates of yu, from (4.5) 
for the first two intervals with mid-points 2) and 2 respectively. If 
vr, is identified as the lower boundary of the first interval for which 
assumption (4.6) is made, the estimate of u, at this point is defined by 


log = log m/, — log my (4.10) 


id its Variance is 


| sing this estimate of w, , one may obtain estimates of the remaining 
. recursively from 


and the estimates at the mid-points are given by 

In the appendix it is shown that the asymptotic variances are 


Var = (y,/Bi) + Var 


nd 


aA 
Var sar. = iB + (4: 1) Var (2). 
{n practice one would estimate the variances of the wf? recursively 
and then compute the variances of the mid-point estimates. 

An obvious alternative to assumption (4.6) would be the assumption 
that log u,,., increases linearly with 7 over the interval (0, 7',), since 
this corresponds exactly with the Gompertz-Makeham law often found 
to operate past middle age. Although this has intuitive appeal, it 
leads to difficult estimation equations and is not likely to be worth 
the extra effort. 
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5. An Illustrative Example 


To illustrate the four methods of estimation that have been discussed 
in Sections 3.and 4, data from the previously mentioned experiment 
(Furth et al. [1959]) were chosen. They represent only a small portion 
of a large experiment in which mice were exposed to radiation from a 
nuclear detonation and the survivors brought to the Biology Division 
of Oak Ridge National Laboratory to be individually caged and observed 
until natural death without further treatment. The particular group 
used in this illustration was exposed to an average dose of 240r of gamma 
radiation. Of 220 male animals exposed, 10 died within 28 days and 
the remainder were shipped to Oak Ridge. Tjwo of the survivors were 
subsequently omitted from the longevity study because of improper 
identification. At the time of exposure all mice were between 6 and 
12 weeks of age, The ages at death reported here are actually measured 
in weeks after éxposure to radiation. 

The frequency distribution of ages at death is shown in Table 1. 
In the report by Furth et al. [1959], the mortality intensities for these 
data were plotted by selecting equally spaced intervals for age at 
death and by using »“” in (3.1) to form the estimates. In a previous 
report Sacher [1956], using wider intervals and a different grouping 
of animals by dose, plotted the same data using » in (3.2) as the 
estimator. Both of these methods yielded apparently linear Gompertz 
plots. When the estimator » in (4.5) was used, in all but one of the 
ten dose groups for which the calculation was made a characteristic 


TABLE 2 


Estimates or Morta.itry INTENSITIES FOR Data IN TABLE 1 
See text for notation 


Interval Mortality Intensity 

Midpoint 
(wk) p® PO) 
61.0 00247 .00247 00235 _ 
90.5 00744 .00744 00731 — 
102.0 01929 .01933 01968 02067 
111.0 01419 01423 01415 01286 
121.5 02044 .02052 02020 02233 
129.0 04396 .04414 04628 04567 
135.5 03723 .03751 03763 J3719 
143.0 05375 .05440 05341 05610 
152.0 05865 .06083 06365 06102 
167.5 10000 © 10499 14101 
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nonlinear Gompertz plot was obtained. The method of interval deter- 
mination, of course, was different. In the 240r male group the 208 
animals were divided as equally as possible into 10 groups, ordered 
by age at death, and the intervals determined on this basis. The 
dotted lines in Table 1 show the results. In Table 2, the estimates 
for each mid-point are shown, and for comparison the results from all 
four methods are given. In a sense, comparison of u“” and n™ with 


zn” and »™ for the same intervals is misleading, since the intervals 
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are determined by different stop rules. A suggestion that this point 
may be important is given by the fact that two previous plots based 
on preassigned intervals were essentially linear, whereas yu‘? and p™ 
in Table 2 are not. 

In Figure 1, the estimates based on » and »™ are plotted. In 
calculating the latter, the first two intervals were used to determine 
the lower boundary of the third interval for computing u«. Thus nu 
was not calculated for the first two intervals. For this dose group 
alone the dip in uw, from the sixth to the seventh interval would not 
be noteworthy, but since it occurred in all but one of the Gompertz 
plots made, it can hardly be overlooked. This point will be discussed 
further in a subsequent paper. 


6. Discussion 


Of the four estimators discussed in Sections 3 and 4, only one (u“’) 
is unbiased. Biased estimators are not always undesirable particularly 
if one is describing a functional relation and the bias is constant with 
respect to the independent variables. It has been shown, however, 
that the biases in w and p«@ may be dependent on z and hence distort 
ihe Gompertz ‘plot. Fer this reason w“’ is preferable provided that 
the assumption (4.3) on which it is based is appropriate. This is likely 
to be true if the sample size is large enough to yield relatively small 
values of T,. When this condition is not satisfied, »‘”, though biased, 
miav be preferred because the assumption (4.6) on which it is based 
better with the observed behavior of inortality intensities tha: 
(4.3), an assumption that is inplieit in each of che other three estimators 
Without an extensive oumerical sampling study, further ciucidation 
of these points seems unlicely, 

When ages at death within intervals are not available, the choice 
estimators is narrowed to or Certainly the smal! sampling 

tudy in Section 3 is net sufficient to warrant any general conclusions 
Nevertheless, it seems clear that both estimators have biases*that 
depend on x and that there ts little te recommend one in favor of the 
If either of these esitimators must uscd in smal! samples, 
ene would be weil advised to restrict the region of estimation to ranges 
¥ xin whieh the bias is small. Although no general rule can be stated, 
the results in Section $ suggest that this range should not exceed the 


0 percent or 70 percent point of the sample distribution function. 


7. Summary 


The estimation of mortality intensity Gilso known as age-specific 
death rate, instantaneous death rate, and force of mortality) from 
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small samples presents serious obstacles. Most of the work on this 
problem has been published in only the last 10 or 15 years, and has 
dealt primarily with studies of human populations. Standard actuarial 
procedures suitable for large samples have proved inadequate for 
relatively small studies on groups of patients suffering from cancer, 
heart disease, and other major causes of death. Unlike human studies 
in which patients are lost from observation and enter observation at 
different ages, experiments with animals can be controlled with respect 
to these and many other variables. 

In this paper, several estimation procedures are discussed in terms 
of their applicability to controlled animal experiments, and the methods 
are illustrated with data from a radiation experiment with mice. Parti- 
cular emphasis is placed on the construction of Gompertz plots that 
are important in evaluating theories of radiation injury and damage. 
It is found that the applicability of the various procedures depends 
in part on the type of data available and in part on what assumptions 
may be admissible. 


The author gratefully acknowledges many helpful discussions of this 
paper with Drs. G. E. Albert and H. L. Lucas. 
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APPENDIX 
Asymptotic Variances of uS” 


To simplify the notation slightly let a, = uj? and m, = us?izr, - 
Then the mid-point estimates are defined by ay = f; + Ais, and 
= N./B; = — 1) — = 1, , k, where f, and 
(a,;) are given by (4.10) and (4.11), Further, Ict 

= E(N,), A; = E(A,), B; = E(B,), whereupon = N & 
= ae since in halle the asymptotic variances, only linear terms 
in the series expansions are considered. 


The variance of f;,, is the variance of a ratio and may be approxi- 
mated by 


Var (A; 41) 


= [Var (N,) + wis: Var (Bi) — 2ui4, Cov (N; , 


But Var(N,) = Var(A,a,) & Aj Var(@,) + ui Var(A,), Cov(N; , By) 
—u; Cov(A; , B,). Thus the variance of f;,, is given approximately by 


Var (Ai41) = [Aj Var + Var (A,) + Bees Var (B;) 
+ Cov (A; , B,)\/B; 


The distribution of Q; = Aju; + Biss, is given by (4.8) and from 
this it may be shown that 


Var (Q;) = yi = Bi Var (A,) + Var (B;) + Cov (A; , Bi), 
whereupon 


Var = (y./Bi) + (A?/B?) Var (f,). (A.1) 


| 
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To evaluate Cov (a; , 2;.,), the ‘approximation 


By Ns 


is used in aaa 


B; i-1 B.., 
Noting that B, is uncorrelated with B;_, and N,_, , the covariance 
is given by 
Cov (A; , = (1/B,B,-:) Cov (N; , Ni-1) 
= Cov (N,; , Bi-1). (A.2) 


Since = (y; — 1) — A\N _, , the approximation 
A iN 1 


| = N,..) At 


(B;-, B,- 4 


is needed to evaluate Cov (N; , N;_-,) and Cov (N, , By-,). Because 
A, is uncorrelated with N,_, and B;_, , the covariances are 


Cov (N; , = —(A,/B;-1) Var (N;-1) 

+ (A.N,-,/Bi-1) Cov (N;-1 , B:-1), 
Cov (N; , B;-:) = —(A;/B;-1) Cov (Ni-1 , Bi-1) 

+ (A,N,_,/B3_,) Var (B;-1). 
Then, by substitution in (A.2), 


Cov (a; Biss) 


A; 


+ Var (A,-1)] = Var (B;-) 


= Cov (A,-1 , By-1) 
+ A?_, Var + pi-1 Var (A,-1) + Var (B,-,)] 
= + Var (f;-1)]. (A.3) 
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Finally, utilizing the results in (A.1) and (A.3), the asymptotic variance 
of the mid-point estimate is given by 


Var (2m,) = Var + Var (f:4:) + 2 Cov (A; » Biss) 


= + Var + + Var (A,) 
= + Var (A;-1)] 
= + vara). (A.4) 


As in all asymptotic variance formula, one would use the observed 


A, and B,; in (A.1) and (A.4) for their expected values, which are 
unknown. 
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FITTING THE POISSON BINOMIAL DISTRIBUTION’ 


Rosert Suumway AND JoHN GURLAND 


Towa State University Of Science and Technology 
Ames, Iowa, U.S. A. 


1. Introduction 


One of the most important problems arising in the fitting of dis- 
tributions is the estimation of the parameters which are involved. In 
many cases, the application of either the method of moments or the 
method of sample frequencies will give fairly simple equations for 
obtaining estimates. As these methods will often give unsatisfactory 
fittings, it is desirable to estimate the parameters by an efficient method 
such as maximum likelihood whenever this is possible. This method 
has been used to fit the Neyman Type A distribution by Shenton [6], 
Douglas [1], and to fit the Poisson binomial by Sprott [7]. 

The purpose of this paper is*to describe a much easier procedure 
for getting maximum likelihood estimates and computing probabilities 
in the fitting of the Poisson binomial distribution. The idea is similar 
to that employed by Douglas [1]. A table is provided which allows the 
computations for obtaining maximum likelihood estimates to be con- 
siderably simplified and shortened. 


2. Maximum likelihood estimators 


The probabilities for the Poisson binomial distribution can be 
written (cf. Sprott [7]) as 


PIX = PH =e (a) 


where n is a known positive integer an@ a and p are parameters to be 
estimated. (q = 1 — p). The ordinary recurrence relation is given by 


A proof of some general recurrence relations of which this is a special 
case may be found in Katti and Gurland [3]. Two main difficulties 


‘This research was supported by the United States Air Force Contract No. AF 49(638)43 moni- 
tored by the Air Force Office of Scientific R h and Develop t Command. 
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arise when one is working with this relation. First, the formula itself 
is long and the computation of the successive probabilities is tedious 
because each. probability depends on all the preceding ones. This 
dependence also leads to the accumulation of errors which may. seriously 
affect the later probabilities. 

However, it is possible to develop a simpler recurrence relation by 
writing P(k) in the form 


where 
= ag” (4) 
and 
(nt)"*’ = nt(nt — 1)(nt — 2) (nt —k + 1) (5) 
and noting that 


where X is a Poisson random variable with mean \. Thus, (6) is the 
kth factorial moment of the variable nX. From (3) and (6) we can 
write 


= ee) (7) 
Consequently 
PQ +1) = (8) 
where 
= (9) 


is the ratio of the two factorial moments corresponding to the variable 
nX. The recurrence relation (8) is much simpler to use than (2) if we 
have tables giving values of p,,, for different values of X. 

The likelihood equations as given by Sprott [6] can be expressed 
simply in terms of the pj,,; values. The likelihood equations can be 
written 


nip = k, (10) 
L@) = — N =0 (it) 
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where 4 and # are maximum likelihood estimates of a and p and & is 
the sample mean. In L(#) the a, are the observed frequencies and 
N is the sample size. Also 


F(k) = + 1)P(k + 1))/népP(h). (12) 
After substituting from equation (8) into equation (12), we have 
F(k) = pi (13) 


which leads to the likelihood equation for the parameter p = 1 — gq, 


1 
L(p) = nad — N = 0. (14) 
Sprott [6] has shown that differentiating L() leads to 
“Ue — 1 4) | 
= > (i + AF(k) (15) 
If we substitute for F(x) in (15), we will obtain 
1 


where 


= — (17) 


Thus, if #; is an 7th estimate of p, we may calculate L(p,) and L'(p;) 
from (14) and (16). Newton’s formula tells us that a closer approxima- 
tion #;,, can be computed by using the relation 


The iterative procedure, then, is to estimate ~, by some relatively 
simple method such as sample moments or frequencies. Equation (18) 
will then yield a value for #. which may be substituted into (14) and 
(16) to obtain L(p,) and L’(f.). #, will then be given by (18). Iteration 
is discontinued when no substantial change is produced in the estimates. 
These formulas will yield a quick and efficient fitting if we have the 
Pix; and g,,, values readily available. The tables for these values are 
described in the following section. 


3. Tabulation of py; and qu; forn = 2 


Since the p,,; are written in terms of u,; , the computation of the 
#1,, becomes the primary problem. We will first use the fact that all 
the cumulants of a Poisson variable X with mean \ are \. From rela- 
tions between moments and cumulants we will be able to calculate the 
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moments u{ of the variable nX by using the relation 
= = (19) 
To obtain the factorial moments we note that (nX)"! can be written 


. k 
= + + + Si@X)' = Stax)’. (20) 
t=1 
The upper index of Sf is the degree of the polynomial being represented 
and the lower index is the power of nX with which a particular S} is 
associated. The numbers Sj are called Stirling numbers of the first 
kind. A recurrence relation given in Richardson [5] which makes the 
calculation of the S} fairly simple is 


Si) = Si. — (21) 
where SE = 1, Sj = 0. Applying (21) we have, for example, 
Si = Si} — S} = 0 — 1 = —1. It isa fairly rapid process, then, to 


build up a table of S} values from the fundamental recurrence relation 
(21). The factorial moments of nX can now be written in terms of 
the regular moments as follows 

k 


Since (9) will be a function involving \ and n only, it can be tabulated 
for different values of \ and n. The q,,; are easily obtainable from the 
Pty by relation (17). MeGuire et al. [3] have stated that useful values 
for the integer n may be expected to lie between 2 and 4. Since the 
Poisson binomial approaches the Neyman Type A quite rapidly as n 
increases, a value of 2 is taken for n in this paper. Other values of n 
could also be taken but are not included here due to the excessive 
amount of computation involved. Tlowever, the functions p,,; expressed 
in terms of A and 7 are available and could be computed for other n 
values if desired. If n = 2, the py; can be written in terms of \ as 
follows: 


6A + 42? 
Pinoy = 2d, Poy = + 2, = * 
_ GA + 24n? + _ GOA’ + 80d* + 
Pisi = 6\ + 402 ’ Piss = “Gy + 24? +8? 
_ GOX* + 360A* + 240\* + 32a° (23) 
Piss = + 80° + 


_ 840° + 1680d* + 672d° + 64d° 
Pros “GOX + 360d" + + 320° 
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840 + 6720 + 6720? + 1792d* + 128,‘ 


840 + 1680\ + + 
146200 + 40320d? + 24192° + 4608A* + 256,° 
840 + + 672007 + 1792\° + 128\° 
_ 15120 + 151200X* + 201600A* + 80640A* + 11520A° + 512a°. 
146200 + 40320d? + 24192d* + 4608A° + 256a° 
In the special case where n = 2, the estimating equations become 
2ap = k (24) 
1 
= 244 — N=0 (25) 


L'(p) = [x — | (26) 


where the px, and q,,; can be found by calculating \ from (4) and then 
writing down the corresponding p,,.,; and qg,.; values which appear in 
Table IV. Since second-order differences are small, linear interpolation 
is usually adequate, although quadratic interpolation would result in 
increased accuracy. 


4. Examples 
Example 1. 


For example, consider the data in Table II which is taken from 
McGuire et al. [4]. In order to use (24), (25), and (26), we need to 
obtain preliminary estimates of a, p, and g. This can be accomplished 
by using a moment estimate for p which is given by 


= — — Dk (27) 


where & is the sample mean and s’ is the sample variance. ~ This leads 
to the following estimates, #, = .5798, §, = .4202. From equation (24) 
we have 4, = 1.4160 and from (4) 4, = (1.4160) (.4202)? = .250. Linear 
interpolation in Table IV yields }> apy) = 404.898, >> argu, = 38.44, 
which gives from (25) and (26) L(f,) = 16.21, L’(f,) = 188.7. On 
using equation (18) we obtain ~, = .5798 — 16.21/188.7 = .49. The 
results of several more iterations are shown in Table I. 

The final estimates are f = .4030, 4 = 2.0372, ) = .726. We can 
calculate the successive probabilities by using the simplified recurrence 
relation (8), since the values of p,,; are readily available. In order to 
get P(0), we notice from (1) that P(0) = e**. In our case P(0)= 
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TABLE I 
ftuseurs ov Four 


Ds 
L'(p.) 
1 5798 . 250 09 
2 AY 436 06 
43 .620 
4 .697 007 
5 -4030 .726 .0000 
ee" 2. 269491. The application of recurrence formula (8) leads 


to the frequencies given in column five of Table II. Columns three 
and four show the expected frequencies as given by the method of 
moments and the method of maximum likelihood respectively using 
the longer recurrence relation (2). 

The formulac for the variances, and covariance of the maximum 
likelihood estimates have been given by Sprott [7] as 


TABLE II 
CoMPARISON OF THE FrrrING OF OBSERVED FREQUENCIES OF P. NUBILALIS 
By Various MErnops TO THE Porsson BINOMIAL DISTRIBUTION. 


Observed 

k frequency! frequeney? frequency frequency 

l 06 69.66 85.70 85.58 

2 57 22.09 * 70.86 70.83 

3 44 38.69 42.05 42.02 

4 1b 23.83 22.01 22.01 

10.65 9.37 9.87 

7 5.01 4.04 1.04 

7 1.94 1.50; 1.50 

xia) = 18.90 Xia) = 9.38 Xia) = 9.21 
< .005 P(x?) = .05 1’(x?) = .06 


{Data from McGuire et al. [4]. 

2Computed using moment estiinates and recurrence relation (2) (ef. McGuire et al. [4]). 

3Computed using maximum likelihood estimates and recurrence relation (2) (cf. Sprott [7]). 

‘Computed using maximum likelihood estimates given by (24), (25), and (26) and the shorter 
recurrence relation (8). 
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where 


T,,/N = —naqgqA +nq+p, and D,, = na(n a) A — (n — 1)? 


with 


A=-1+ (29) 
Equation (29) can be rewritten from (13) in terms of py; as 
1 
A= -1+ D (30) 


Thus, we can compute A in (29) by using the p,,,; values corresponding 
to the final estimates for the parameters and the final probabilities 
P(k). This leads to 


(6.36950) 
[2(2.0372) .5970]° 


The variances and covariance as computed from (28) are 
o, = .0102, = .2632, = —.0508. 


Example 2. 


A= -1+ = .07654. 


The procedure of fitting the Poisson binomial has also been applied 
to the data in Table III taken from McGuire et al. [4]. The original 
estimates given by moments were p = .2474, d = .8296, } = .470. 
The final estimates given by maximum likelihood are p = .2563, 
= .8296,} = .470. 

The expected frequencies by maximum likelihood using Table IV 
are shown in column five of Table III. If the large sample variances 


and covariance are again computed from equations (28) and (3) we have 
o, = .0019, o. = .0180, Cop = — -0056. 


5. Comments 


It can be seen from the two examples that the expected frequencies 
calculated by using Table IV differ only slightly from the expected 
frequencies which use the longer recurrence relation. This difference is 
due to the linear interpolation employed in using Table IV. Tor most 
practical purposes linear interpolation is adequate. Machine interpo- 


+ 
i 


THE POISSON BINOMIAL DISTRIBUTION 529 


TABLE III 
Comparison Or Frrvinc oF OBSERVED FrReQuENCIES OF P Nubilalis 
BY Various Meruops To THE Potsson BrinomIAL DISTRIBUTION. 


Observed Expected Expected Inxpected 
k frequency! frequency? frequency? frequency * 
0 907 904.44 906 .09 906.18 
| 275 279.42 276.61 276.69 
2 89.09 89.89 89.92 
3 23 18.63 18.85 18.86) 
4 3 4.31 4.56 4.35f 
= -50 = -34 xi = 
P(x?) = .A9 P(x?) = .58 P(x2) = .54 


‘Data from McGuire et al. [4]. 

2Computed using moment estimates and recurrence relation (2) (ef. McGuire et al. [4]). 
3Computed using maximum likelihood estimates and recurrence relation (2) (cf. Sprott [7]). 
‘Computed using maximum likelihood estimates given by (24), (25), and (26). 


lation is simplified by the following formula. If we have x, , f(x.) and 
wo, f(x) and we wish to approximate f(z, + 4) where 


0<k<|x—2,|=4, 


we can write 


which is a form convenient for machine calculation. For example, if 
143 in Table IV, = (.85) (1.81617) + (.15) (1.87833) = 1.82549. 
6. Truncated distribution with zero class missing 

In some cases it may be useful to consider fitting the truncated 


Poisson binomial distribution where the zero class is missing. The 
maximum likelihood equations become? 


nép/{1 — = k' (32) 


N'P(O) 

Lp) = al'(k) — N'- a) = 9 G3) 
where £', 5°', and N' are calculated without the zero class. We may 
use the result given in (13) to write equation (33) as 


*Lhe authors acknowledge the suggestions of D. A. Sprott in this section. 
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nap 


N'PO) 
1— PO? ~ 

Thus, it becomes possible to compute L(f) by using Table IV. The 
calculation of L’(j) is probably too complicated to be worthwhile. 


Nevertheless, given two approximations p, , p;., a better one, p,.2 
can be found via 


Pise = [Biss Disa) L(p;+2))- (35) 


Also, having found #;,2 from above, a better approximation will be 
given by 


= 


—N'- 0. (34) 


Dias Dise Diss) 27.41 + Dil, (36) 


(cf. Hartree [2]). After we have estimated the parameters we may 
proceed as before excluding the zero class from our computations. 


7. Summary 


The fitting of the Poisson binomial distribution by maximum likeli- 
hood is considered. The maximum likelihood and recurrence relations 
are rewritten in terms of ratios of Poisson factorial moments and these 
ratios are tabulated for values of the parameters when n = 2. A simple 
example illustrates the computational procedure and shows how the 
labor in fitting may be reduced considerably by using the tables. 
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A MODEL FOR THE ANALYSIS OF THE DISTRIBUTION 
OF QUALITATIVE CHARACTERS IN SIBSHIPS 


A. M. 
Office of Biostatistics 


State Department of Health, 
Albany, New York, U.S. A. 


The basic treatment of the problem of estimating the frequencies 
of recessive characters through sibship data begins with the consideration 
of sibships of a given size and with the binomial assumption for the 
number of recessive children in a sibship. Estimates for each size 
group are obtained separately and then combined on the basis of an 
‘a posteriori’ size distribution. Fisher [1934], Haldane [1938] and 
Bailey [1951] have utilized such a procedure in obtaining estimates 
of the segregation ratio for a particular trait from data collected under 
alternative methods. 

There is a question whether the binomial distribution holds, for it 
is possible that the sibship size depends on the composition. In an 
extreme case, parents may decide to limit their family on the basis 
of producing a child with a defect they consider to be hereditary or 
likely to appear in a subsequent birth. The present paper, through 
the introduction of ‘a priori’ size distributions, represents an attempt 
to account for such a sequential procedure by which the total family 
may be obtained as well as the natural size variability arising out of 
differences in fertility between couples. 

At the outset of reproduction, each couple may be thought of as 
possessing a potential family size (NV) jointly mediated by fecundity 
and by the desire for a particular number of children, say ‘‘ideal”’ size. 
We shall ignore the possibilities of the loss of children through death 
and the dissolution and reconstruction of marriage through divorce 
and remarriage, solely focusing attention on completed sets of children 
(sibships) born to the same parental pair. The number of children 
obtained during the course of reproductive history, realized size 7’, 
will depend on the couple’s success in attaining their ideal size, failing 
to do so through relative infertility, through faulty contraceptive 
technique, or through a downward revision in the ideal occasioned 
by the production of one or more children affected by the condition 
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under study. In the latter circumstance, such selective termination 
insures that 7’ will be dependent on family compos: vion. 


I. GENERAL MODEL 


To formalize the above considerations, we may begin by defining 
the following random variables and corresponding frequency and 
generating functions: 


N: the potential sibship size or the composition-independent limit 
on the total number of children in a sibship. 

T: the realized sibship size or the total number of children in a 
completed sibship. 

M: the number of children in a sibship affected with the condition 
under study. M is assumed to be the sum of k independent 
binomial variables, 


. _ J 0 ifthe ath child is not affected, 
M =X. + +++ +X, where X_ = { 1 if the ath child is affected, 


A: the number of independent ascertainments of a sibship. A is 
assumed to be the sum of j independent binomial variables, 


0 if the 6th child is not detected, 
1 if the 6th child is detected. 


the p.g.f. of Pr (7. = k) 
the p.g.f. of Pr (X,) 
the p.g.f. of Pr (¥,). 


(The abbreviation p.g.f. is used for probability generating function.) 


A= Y,+.---+ Y; where 


li 


Pr = k|N =n) f(s) 
gx = Pr(M=j|T=h) gs) 
= Pr(A =i|M =) h(s) 


Under this setup, we shall assume the joint probability of the three 
random variables A, M/, and 7' to be given by, 


N, the biological or composition-independent limit on sibship size, 
will be observable only for sibships which are not terminated selectively. 
In this case, N is the same as the realized size 7 and f,, may be written 
as f, . The appropriate form for f, will depend on the fertility and 
family-limitation practices of the population under study. Lotka [1939] 
found certain American family statistics to be described by 


fo = 1—[ba/(l —a)] and f, = for 


a distribution which is a. geometric series with the exception of the 
first term. Kiser and Whelpton [1944] display size distribution of 
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6,551 completed families classified by such characteristics as religion, 
education, economic status and age of mother at marriage. Most of 
these are adequately described by Poisson series. 

The probability g;, that a sibship has j affected given k total children 
depends on the termination rule which the parents follow. With no 
selective termination, the joint probability of A, M and T is 


Pin = figuhis for OS i<j <k. (2) 


Parents with the rule of stopping procreation at the birth of the first r 
affected children, will continue until either T = n or M = r, whichever 


occurs first. In this case, 
for O<i<j<r, 
Pin (3) 


n=k 
For genetically determined, completely penetrant characters, it is 
appropriate to assume the probability a child is affected to be constant 
between and within sibships arising from a given mating type, i.e., 


Pr(X, = 0) =1—p=q and Pr(X, = 1) =p. 
Under no selective stoppage, g;, will be the binomial 


Jx=\-JPQ ; 

while under stoppage at the first r affected children 
(ptt for j <r, 


gx =) (4) 


k-1 
Pq 


The case of incomplete penetrance may be encompassed by assuming 
p to be variable or the existence of correlation between the outcomes 
of successive births. In either instance, we are led to consider compound 
distributions of the form 


gh = gixf(p) dp. 


Methods of Ascertainment 
Two methods of collecting sibship data will be discussed: 


A. Complete ascertainment through affected children whereby all 
sibships with at least one affected child are observable. In this case, 


‘ 
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the probability of ascertaining a sibship with at least one affected is 
taken to be a certainty and an impossibility for a sibship with no affected 
— 
1 j> 0. 


The probability a sibship is of composition (j, k) is conditional on 
M> 0. For.N = T = n fixed, such a situation has been treated by 
Fisher [1934] who describes the type of selection as the “proband” 
method. 

B. Incomplete multiple ascertainment through affected children, 
whereby sibships come under observation with probability dependent 
on the number of affected children they contain and may be ascertained 
as many times as the number they contain. For N = T = n fixed, 
Fisher using the term ‘‘sib” method, and Bailey, using the term “‘incom- 
plete multiple selection”, derive estimates of p assuming 


hi = (i 
where p’ = 1 — q’ is the constant probability of detecting an affected 
child. This assumption implies that affected children are sampled 
independently and with replacement, a circumstance which may rarely 
be realized in practice. 


Estimation 


The following observational quantities will be required for estima- 
tion purposes: 


w;;, the number of sibships of composition (j, k) ascertained exactly 
times. 
w= >> wy: the total sibships. n = > kw,;,: the total children. 
t,i,k 


m= jw;;.: the total affected. t= iw,;,: the total ascertainments. 


II. THE CASE OF NO SELECTIVE LIMITATION 


Parents terminating reproduction without respect to the composition 
of their family present the simplest case for study. Since potential 
size N and realized size T are the same, 7 has frequency function f,. 
The number of affected children in a sibship is a compound variable 
with generating function f[g(s)], and A, the number of ascertainments is 
a compound variable with generating function f{g{h(s)]}. A partic- 
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ularly manageable form for p;;, arises when it is assumed that f, is 
a geometric series and g,, had h,; are the binomials: 


fp = (1 — aa’, f(s) = (1 — a)/(1 — as), 


ix = | oT, g(s) = (q + ps), 


h;; h(s) = (q’ p’s). 


ll 


Under this setup, the compound variables will also be geometric since 


which is the generating function of a geometric series with parameter 
z = ap/(1 — aq) and 


1—2z 
= 7 = 
1—2(q' +p's) 1—ys’ 
which is the generating function of a geometric series with parameter 
y = app’/(1 — a+ app’). Hence M and A have frequency functions 


Pr(M = j) =(1—2z' and Pr(A = 1) =(1 — yy’. 


The probability that a sibship has at least one affected child and the 
probability that a sibship is ascertained at least once are given by 


Pr(M>0) =z and Pr(A >Q = y. 


Assuming f, is a Poisson series with parameter @ and g,, and h;; binomial 
as before, the compound variables A/ and 1 will be Poisson with param- 
eters 0p and @pp’ respectively since 


Gy(s) = and H,(s) =e a) 


For mixed Poisson distributions such as those occurring in the sample 
obtained by Kiser and Whelpton in their Indianapolis study, it may be 
appropriate to treat @ as variable. Compounding f, with a gamma 
distribution on @ leads to the negative binomial (1 — a)’/(1 — as)’. 
in which case the new compound variables J/ and A are also of the sam: 
form since their generating functions are 


= [(1 — —28)]’ and H,fs) = 
where z and y are as defined in (6). 


Estimation of the parameters under various sets of asstuciti i 
for the two methods of ascertuinment may be illustrated as follow = 


| 
(5) 
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A. Complete ascertainment through affected children 


Under the geometric-binomial-binomial model given in (5) and 
under complete selection of all sibships with at least one affected child, : 
the joint probability of AJ and 7 is conditional on M > 0. Since ' 


Pr (Af > 0) = z, for an observed set of w sibships, the likelihood function : 
is given by 
Minimizing Z with respect to the two parameters leads to the equations 
w n 
a(l — aq) 
n—-m,m_ — a) 
q p pl — aq) 


The solutions are the maximum likelihood estimates 


p mand 4 = 
(“) n+w 


iixpressions for the asymptotic variances of j and @ may be obtained 
by inverting the information matrix in the usual fashion. Algebraically, 
these expressions are not. convenient and will be left unstated. 

Alternately, the problem may be treated in the fashion of Bailey 
{1951] whereby it is assumed that the population under study contains 
WV couples, capable of producing affected children, of whom w actually 
doso. In this ease W isa parameter to be estimated and w is an observ- 
able variable with frequeney function 


W 
Pr(w) (k= 2° 
w 
The likelihood of an observed set is hence given by 


c= — wi (I 2) (I 
The likelihood equations are given by 


Sw = MW + 1) — WW +1 — w) + log (I — 2) = 0, 


= 


q p 1 — aq 
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___w n (W — w)p 
s. = (1 — a)(1 — ag) 
where ¥(x) = d/dz log (x — 1)!. 

For large W, ¥(W) ~ log (W) and the equation Sw = 0 will be satisfied 
approximately by W = w/z. Eliminating W from the last two equations 
in (8), leads to the relation 


4 = (n — m)/(n + w)@. 


Substituting W = w/z in the equation S, = 0, approximate maximun 
likelihood estimates are obtained as 


= 


When f, is assumed to be Poisson with parameter @ and g;, binomial 
as before, the likelihood of an observed set {w,,} is given by 


Ww—w! le 
The likelihood equations are 


S, = —-we+— = 0, 
Pp q 


S, = (9) 


WW +1) (W+1-—w) 
The first two equations lead to the relations 


, wm + Win — m) wm 
Ww 


Il 


(10) 


Sw = 0 may be written as 
1 1 1 m 
WwW 
and solved by numerical methods. Alternately, for large W the equa- 
tion will be satisfied approximately by 
W ev 
W-—w 
The solution of either of the above will be taken as an approximate 


maximum likelihood estimate of W. Estimates of 6 and p may then be 
obtained by substitution in (10). 
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B. Incomplete Multiple Ascertainment 


In circumstances where the probability p’ of detecting an affected 
child is less than 1, the greater the number of affected children in a 
sibship, the greater the probability that sibship will come under observa- 
tion. If each propositus is identifiable and the number of independent 
ascertainments of each sibship hence observable, it is appropriate to 
consider the conditional probability 


Pr(A =i,M=j,T=k|A>0O) for O<icj<k. 


lor f, geometric with parameter a and g;, and h,; binomials with param- 
eter p and p’ respectively, the likelihood of an observed set {w,;,} of 
sibships ascertained ¢ times in all and comprising a total of m affected 
among n children is given by 


Minimizing L with respect to the three parameters and solving the 
resulting likelihood equations leads to the estimates 


n-(*) 

The alternative Bailey-type formulation would be to assume that 
the population contains W sibships, with at least one affected child, 
of which w are ascertained at least once. For f, a Poisson series with 
parameter @ and g;, and h;; binomials as before, the random variable 
A is Poisson with parameter (@pp’) and the probability of 0 ascertain- 
ments is seen to be e’””’. lor w sibships ascertained ¢ times, the likeli- 
hood function is 


(W — w)! 
The likelihood equations are 

nm n-—-m 

= — — —(W — w)@p’ = 0,7 
( ) Op 

p q 
n 

So — w)pp' = 0, 

Sw = +1) (WW +1 — w) — Opp’ = 0. 
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The first three equations in (12) lead to the relations 


W np + — m’ W 


Substituting these relations in Sy = 0, W may be obtained by numerical 
solution of the resulting equation 


1 


When W is large, Sw = 0 will be approximately satisfied by 


W 
= 


W—w 


1 
wt 


The resulting estimate of W may then be substituted in (13) in order 
to obtain estimates of the three parameters 6, p and p’. 
Ill. TERMINATION WITH THE FIRST r AFFECTED CHILDREN 


Under the geometric-binomial-binomial model, the probabilities 
associated with sibships of parents terminating reproduction at the 
birth of the first r affected children are given by 


pre for g<r 


J 


a‘ "pes for j=r. 
L ir 


The frequency function of the number of affected children in a sibship, 
M, from (6) is given by 


(1 
Pin = 


—z)z' for j <r, 


r 


for j =r, 


and the probability a sibship has at least one affected is seen to be z. 
Estimates of the parameters under the two methods of ascertain- 
ment may be obtained as follows: 


A. Complete Ascertainment through Affected Children 


Since all sibships with affected children are ascertained, p’ = 1 and 
the joint density of M and T is conditional on 1f > 0. The probability 


| 
| 
| 
(13) 
it 
aM 
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a sibship is a composition (j, #) is hence given by 
Pr(M = j,T =k|M>0O) 


(1 — aja‘ for O<j <r, 


me for j =r. 


When parents terminate reproduction at the first birth of an affected 
child (r = 1), the joint probability is 


Pr(M = 1,7 =k|M>0) =Pr(? =k| M =1) 
= (1 — aq)(aq)*"' for k = 1,2,3,-:- (14) 


Iixamination of (14) reveals that the two parameters are not separately 
estimable in this instance. 

lor termination with the first r affected children (r = 1), the con- 
ditional frequency function of M is ; 


Pr(M = j|M>0) = zz for O0<j <r, 
for j = 


The first two conditional moments of M are given by 


E(M | M > 0) = and V(M |M > 0) 
— 
(i — 2)” 

When r = 1, the conditional expectation is seen to be 1 and the condi- 
tional variance 0. This is in accord with the fact that the decision rule 
r = I insures that each ascertained sibship will consist of exactly one 
affected child among k total children. 

lor w sibships of parents terminating reproduction with the first 
r affected childern (r > 1), the likelihood function and likelihood equa- 
tions are: 


= (1 — "2", 


where w, is the number of sibships with r dies 


S,=-- = 0, 


m.-n—m wi —a) 


S, = 0. 
p q p(l — aq) 


3 
= 
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The solutions are the maximum likelihood estimates 


B. Incomplete Multiple Ascertainment 


Under stoppage at the first r affected, incomplete multiple ascertain- 
ment, and the geometric-binomial-binomial model, the probabilities 
associated with an observed set of w sibships will be conditional on 
A> 0. The probability A is greater than 0 may be shown to be 


Pr (A > 0) = y[l — 
The likelihood function is hence given by 
The likelihood equations do not yield explicit estimates of the three 
parameters. When r = 1, the parameters are not estimable since 
each sibship must contain exactly one affected child and w = w, = t = m. 
For r = 2, the following relations among the estimates are obtained: 
a n—m t — w)(1 — 4) 
(n+ w — (w — w,)ap 
Substituting these relations in S, = 0 leads to a quadratic form in p, 
the positive root of which is the required maximum likelihood estimate: 


a and 


C. Mized Populations 


It may be anticipated that the stoppage rule itself will vary between 
couples. If information on the stoppage rule can be elicited from parents 
and sibships thereby classified by the value of r as well as A, M and T, 
a mechanism is available for investigating the joint effects of variable 
r and variable sibship size on the estimation of the parameters. Letting 
p, be the frequency function of the first number of affected children 
at which parents cease reproduction, the probability a sibship is type 
(r, j, k) and is ascertained 7 times is given by 


asr j<r, 


Letting p, = (1 — b)b’~' and setting v = >- rw,,;, , the maximum likeli- 


rijk 


| 


n (2=*) - (2=2) 
p= and 4 = ———— 
n (2=*) n+w-— w, 
m— 
4 
| 
| 
n+ w — w,)p 2g’ 
m—np = ji —! + |. (15) 
m+w— wv, 1 
| 


Ascertainment 


Complete, all sibships 


Rule 


lle and methoa on the maximum likelihood estimates of 
he nrohahilit 9 ehild is rector vitm tt 1 } 
é AULILY Child is airected witno ie condition under study, and 
the probabuity an affected chiid may be compared in 
abular form 
FABLE 1 
I;STIMATES OF p AND p’ BY ASCERTAINMENT PROCEDURE 
i AND SELECTIVE STOPPAGE RULE 
Tet] 
Method of Stoppage 


none 


in the population. 


m 
n 


with affected children. 


Complete, ail sibships none 
— 
(“Proband” method ) 


— WwW, 


| 
3 


Incomplete and multiple none \ 
i 
,2 
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IV. SUMMARY 


. A general model for the analysis of the distribution of qualitive 

characters in sibships is proposed. 

2. Beginning with the treatment of potential sibship size as a random 
variable, a method is formulated for studying the joint distribution 
of realized size, number of affected children and number of inde- 
pendent ascertainments under sibship-composition-dependent termi- 
nation rules for two types of ascertainment. Alternative assumptions 
on the conditional distributions of the three variables are discussed. 

3. Representing sibship size as a geometric and as a Poisson variable, 
and the number affected and the number of independent ascertain- 
ments of a given sibship as binomials, maximum likelihood estimates 
of the parameters are obtained for two ascertainment procedures 
for the case of no selective termination of reproduction. The results 
are a direct generalization of Fisher’s proband and sib methods 
where sibship size is treated as a constant and/or where ‘a posteriori’ 
size distributions are utilized. 

4. Under the same model, maximum likelihood estimates of the param- 
eters are obtained for the case of termination of reproduction at 
the birth of the first affected r children. os 

5. The model is extended to include the case of variable taniiteation 

rule. 
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ON THE ANALYSIS OF REPEATED-MEASUREMENTS 
EXPERIMENTS’ 


M. B. Danrorp, Harry M. Hucues R. C. 


School of Aviation Medicine, USAF 
Brooks Air Force Base, Texas, U.S. A. 


1. STATEMENT OF THE PROBLEM 


Suppose there are J levels of some treatment, with n, subjects 
exposed to the 7th level, and each subject measured on some charac- 
teristic periodically for K times post treatment. Let y,;, be the meas- 
urement made at the th time on the jth individual at the 7th treatment 
level. Then there is associated with each subject for a given 7th treat- 
nent level a vector of K measurements [y;;,], k = 1 to K. Denote 
the average over the population from which the subjects were drawn 
by E(yis) = vi . This average represents all fixed effects and may 
be decomposed into » + a; + yz + 45. The random departure of 
Yi; {rom v,, may be denoted by m,,;, so that 


= Vie + Mize (1) 


where as convenient, we may further break down m,;, into b;; + gis: 
or into b,; + fii + €:;, to account for the subject effect, the subject 
by time interaction, and the replication error. We assume that the 
only effect of treatment is on the means so that the covariances may 
be denoted 


Cov (Yin = if 2 (2) 
0 otherwise. 
The model equation can then be written as 


t= 1,2,---,]; j= 1,2, ; k=1,2,---,K, (3) 


Il 


Presented at the Fourth International Biometric Conference, Ottawa, 1958. The contents re- 
flect the personal views of the authors and are not to be construed as a statement of official Air Force 
policy. Publication was supported in part by a grant from the United States National Science 
Foundation. 
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with the usual side conditions 


Dna, = 0, = 0, 
‘ & (4) 
Ln = = 0, E(b;;) = 0, = 0. 


In the above equations, fixed effects are denoted by Greek letters 
and random effects by Latin letters. The symbols are defined as: 


= overall mean, 
a; = effect of ith treatment level, 
= effect of jth individual measured at ith treatment level, 
‘Ys = kth time effect, 
5,, = treatment by time interaction effect, 
= individual by time interaction effect plus usual random error 
component. 


The validity of the assumption that the variances and covariances 
are the same over the various treatments needs to be checked for each 
experimental condition, of course. The procedure for testing this 
assumption has been given by Box [1950] as an extension to the multi- 
variate situation of Bartlett’s homogeneity of variance test (see Box, 
[1949]). The test criterion is 


M = (N — J) log. | An: — 1) log, | Saas 1) (5) 
where N = )>, n, is the total number of subjects, 
Aw =(N — Yer — Yow) (6) 


is the covariance between the kth and k’th times corrected for treat- 
ment means, and 


= — 1)" (Yiik — — Yin) (7) 


is the unbiased estimate, based on n; — 1 degrees of freedom, of the 
covariance between the kth and k’th time for the 7th treatment group. 
A dot subscript in equations (6) and (7) as elsewhere denotes an average. 
Then (1 — A,)M is distributed approximately as x’ with f, degrees 
of freedom, where 


2K? + 3K —1 1 1 
a, = (-4)- (8) 


fi = 40 — K(K +). (9) 


and 
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In what follows, it is assumed that the y,;;, have a joint normal dis- 
tribution. 

The problems at hand are to obtain tests of hypotheses about 
treatment, time, and treatment by time interaction effects. The diffi- 
culties associated with performing a univariate analysis of variance 
in a mixed model 2-way set-up, with replications within cells, have 
been demonstrated by Scheffé [1956]. He showed that the usual sums 
of squares are all independent except for the random main effect and 
interaction pair, and that all but the sum of squares for fixed main 
effects are distributed as x”. However, the univariate analysis is 
valid for testing all effects under the more restrictive assumption of 
equal variances and covariances between times, 7.e., between fixed 
effects (see Danford and Hughes [1957]). The design outlined in the 
opening paragraph is more complicated than the two-way set-up; yet 
the problems raised by Scheffé carry over into any mixed model situa- 
tion. Among repeated measurements taken in time, there frequently 
is serial correlation present: the measurements taken closer together 
in time are more highly correlated than those taken further apart. 

In the section that follows we will sketch a particular example 
so as to help fix ideas on the experimental design. This will be followed 
in Section 3 by a discussion of univariate procedures, with an application 
of these techniques to the data of the example. Section 4 will then 
be concerned with multivariate techniques: the assumptions behind 
the techniques, the application of such to the data and a comparison 
of the univariate and multivariate tests. A summarization is then 
presented. 

The raw scores for the example are listed in a table in the appendix. 


TABLE 1 
NuMBER OF SuBJECTS WITH AVERAGE AGE BY TREATMENT GROUP 
Treatment No. of Average 
group subjects age 
control 6 64.5 
25-50 r 14 67.4 
75-100 r 15 60.2 
125-200 r 10 57.3 
45 62.4 
(Total) (average) 
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2. EXAMPLE 


Forty-five individuals, all suffering from certain cancerous lesions, 
were trained on a psychomotor testing device. Of these 45 subjects, 
6 were used as controls and 39 were given varying amounts of whole- 
body z-radiation. The radiation dose, number of subjects in each of 
4 dose or treatment groups and average age of the subjects in each 
group are given in Table 1. It was believed by the physicians that the 
control subjects would not be helped by the treatment, owing to the 
location or type or progress of the disease. 

After the initial training period, each subject was measured 4 times 
a day; the average of the 4 trials for each subject for each day was 
taken as the basic score for analysis. The data are shown in the table 
in the appendix.” The averages for each group for each day, including 
the scores for the day immediately preceding the irradiation, are graphed 
in Figure 1. Immediately following the training on the device, the 
39 subjects were given the indicated dose of z-radiation. Then observa- 


240 [TREATMENT GROUPS: 

t= @ CONTROL 
220 
3=A 75-100r | 
200 4:4 125-200r 


| 


TIME IN DAYS POST- IRRADIATION 


FIGURE 1 
TREATMENT Group Muans PLotrep Acainst Days Post-IRRADIATION. 


2These data were accumulated by Lt. Colonel Robert B. Payne of the Department of Experi- 
mental Psychology, School of Aviation’ Medicine, USAF, in conjunction with the Department of 
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tions were made for the 10 consecutive days following the irradiation; 
for the controls, observations were taken for 10 consecutive days after 
the completion of the initial training on the device. 

The purpose of the experiment was to ascertain whether or not 
whole-body z-radiation affected psychomotor performance as measured 
by this device, and if so, whether or not the effects were dose dependent. 

The experimental design left much to be desired as far as control 
of extraneous variables was concerned. Two uncontrolled variables 
which easily come to mind are differences in age and in states of health. 
Treatment groups obviously could not be balanced on age nor on 
states of health since an individual was given the amount of radiation 
believed necessary for his condition. The age range for all treatment 
groups was from 45 to 76 years. The average age for all subjects 
was about 62. The rate of learning and sustained performance did 
not appear to be age dependent, nor was it believed that states of 
health as such appreciably affected learning rate and sustained per- 
formance. Despite the various shortcomings on the design, the example 
can be used to point up the kind of problem with which we are con- 
cerned. It will serve the main purpose of the paper—a demonstration 
of the techniques and procedures available for the analysis of repeated 
measurements taken on the same individuals. 


3. UNIVARIATE PROCEDURES 


If one assumes equal variances and covariances for the K times, 
then (see Hughes & Danford [1958]) all of the usual sums of squares 
are distributed as x”s and are mutually independent. Under such 
an assumption, which will be referred to as the symmetry case, a uni- 
variate analysis of variance is valid. That is to say, when it is assumed 
that 


2 
j 


k= k’, 
Cov = jeer, t=, j=’, kek’, (10) 


0 otherwise, 


in lieu of (2), then the expected values of the sums of squares are as 


indicated in Table 2. The proper error terms to be used for the testing © 


of the effects of interest are apparent. 

It should be observed that the analysis for treatment effects (shown 
above the line in Table 2) is a one-way analysis of variance and that 
the test for these fixed effects is valid regardless of the assumption 
about variances and covariances among the K times, except that equal 
variances for the treatment groups are still assumed. 
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TABLE 2 


ANALYsIS OF VARIANCE 
Symmetry case for repeated measurement design 


Source d.f. E(M.S.) 
Treatment I-1 + (K — + 
Error (a)* o{l + (K — 1)p] 

Time o%(1 — p) + ve) 
Treatment X Time (I — — 1) o%(1 — p) + die) 
Error (b)** (N — — 1) — p) 

Total NK -1 


*Error (a) is the variation among subjects (summed over times) within treatment groups, pooled 
over groups. 

Error (b) is the subject X time interaction, pooled over treatment groups. 

tés@) = non-negative function of fixed effects. Under hypotheses tested, ¢do(w) = 0. 


The results of the univariate analysis of variance for the present 
example are shown in Table 3. The inferences of primary concern 
are that the treatment means are not significantly different, the time 
means differ significantly and the dose by time interaction is not signi- 
ficant. 

Fixing attention on the treatment by time interaction, one observes 
from Figure 1 that the linear part of the mean response curves over 
time accounts for most of the variation. When the slopes for each 
subject in each group are computed and a one-way analysis of variance 
is performed on these slopes, the inference is that the mean slopes of 


TABLE 3 


ANALysis oF VARIANCE OF PsycHOMOTOR PERFORMANCE TESTS 
FoR 45 X-IRRADIATED SUBJECTS 


Source d.f. MSS. F 
Treatment 3 72,184.8 1.25 NSS. 
Error (a) 41 57,803 .9 
Time 9 18,989 .6 60.95 <.001 
Treatment X Time 27 424.6 1.36 N.S. 
Error (b) 369 311.6 
Total ‘449 
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the treatment groups do not differ significantly. This one-way analysis 
of variance is given in Table 4. This same result can be found by 
testing the dose by time (linear) part of the dose by time interaction 
against subject by time (linear) part of Error (b) in Table 3. 


TABLE 4 


ANALYSIS OF VARIANCE OF SLOPES OF INDIVIDUAL RESPONSES IN TIME 
Source d.f. M.S. F P 
Treatment 3 28.42 1.86 N.S. 
41 15.28 
Total 44 


Group MEAN SLOPES 


| control | 25-50 r 75-100 r 125-200 r 
Mean slopes 9.76 6.73 5.49 5.76 
No. of subjects 6 14 15 10 


The overall F-test from Table 4 allows one to make the same in- 
ference on the interaction as that made from the general interaction 
test in Table 3. Yet the slope for the control group appears to be 
different from the slopes for the treatment groups: by Student’s ¢-test, 
this mean slope of 9.76 is significantly different from the average slope 
for the three radiation treatment groups; and by the Duncan [1957] 
range test, the mean slope for the control group is significantly larger 
than the mean slope for the 75-100 r treatment group. 

lor later reference, note that after combination of Error (a) and 
Error (b) mean squares from Table 3, one obtains a variance-component 
estimate of p: 


a Error (a) — Error (b) ag 
" ™ Error (a) + (K — 1) Error (b) 


.9486. 


4. MULTIVARIATE PROCEDURES 


From a standpoint of labor involved, and frequently in the interpre- 
tation of the results, univariate analyses are preferred over multi- 
variate procedures. Yet the assumptions necessary to perform a valid 
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univariate analysis may not be fulfilled. In the data of the present 
example, the assumption that the. treatment affects only the means 
is tenable but we shall see that the symmetry assumptions as reflected 
in (10) are not met. In this section a multivariate test for symmetry 
will be demonstrated, followed by tests for time and interaction effects 
under the multivariate approach and the assumption (2) on the co- 
variances. Finally a serial correlation model that is more general 
than the symmetry assumption (10), but more specific and meaningful 
than the general assumption (2), will be discussed. 

The observed variances and covariances among the 10 post-radiatioi: 
times in the example, corrected for radiation treatment means, were 
calculated by equation (6) and are presented in Table 5. Note that 
the expression in (6) is summed over groups; as indicated after equation 
(4) the assumption that the only treatment effect is on the means 
needs to be tested. In the present example it is not possible to make 
the check since in two of the groups there are more times than subjects, 
leaving too few degrees of freedom for estimating those particular 
within-group covariance matrices. Numerically this fact appears as 
a zero value for | s,.-; | of the affected groups. 


TABLE 5 
VARIANCES AND COVARIANCES OF THE 10 Post-Rap1aTION TIMES 
1 2 3 4 5 6 7 8 9 10 
1 4685 4522 5096 5202 5035 4746 5026 5090 4847 4933 
2 4799 5287 5408 5250 5026 5323 5280 5238 5228 
3 6151 6192 6055 5773 6120 6092 6027 6132 
4 6525 6231 5923 6285 6204 6137 6252 
5 6171 5850 6185 6150 6105 6175 
6 5745 5941 6000 5950 5959 
7 6479 6332 6371 6382 
8 6523 6398 6390 
9 6798 6566 
10 6753 


Note that the average variance = 6060.8 (= o? under symmetry assumptions), average covariance = 
5749.2 (=po* under symmetry assumptions) and that ? = (average covariance) / (average variance) = 
-9486 = estimate of intra-class correlation found from analysis of variance, Table 3. 


The assumption of equal variances and equal covariances can be 
tested on the matrix given in Table 5 by a procedure proposed by 
Wilks [1946] using the likelihood ratio criterion of Neyman and Pearson, 
as modified by Box [1949]. When A, is the covariance between 
the kth and k’th variate, from (6), A,, is the average of the variances, 
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500 


and A,,- is the average of all the covariances, then the criterion for 
testing the hypothesis of equal variances and equal covariances is 


A=|Al/|A| (11) 


where A is the A >) A matrix with element A,,- and A is a matrix of 
the same size with diagonal elements A,, and-non-diagonal elements 


. Next, compute 
M = —(N — /) log, A, (12) 


(N — I) being the degrees of freedom of the covariances tested. ‘This 
statistic 17 when multiplied by (1 — 4,) is approximately distributed 
as x° with f, degrees of freedom where 


A, = K(K + 3)/6(N — DUK — + K 4), 
= (K? + K 4/2, 


and K is the number of variates. 

For the matrix of Table 5, x’ = 182 with 53 d.f. From this result 
we infer that the assumption of equal variances and covariances is 
not likely fulfilled. Thus, the tests for the time effects and group 
X time interaction effects by univariate analysis of variance procedures 
as given in Table 2 are not valid. 

‘The test for treatment effects, as previously noted, is valid because 
treatment main effects depend only on treatment group means; their 
variance is a function only of the total of the elements of the sums- 
of-products matrix and not of the individual elements. This is another 
way of saying that the test for treatment effect is not a multivariate 
problem. 

l‘or multivariate tests of time and interaction effects, we may employ 
either of two general approaches: the likelihood criterion or an approach 
we shall refer to as Hotelling’s 7. The 7 approach consists of (a) finding 
a set of contrasts with estimates 2, , --- , z, having zero expectations 
when no effect exists and non-zero expectations when effect does exist, 
(b) deriving the covariance matrix of these contrasts and unbiased 
estimates v,, of its elements such that v,,- and the contrasts 2, are 
statistically independent, and (c) computing the quadratic form 


—¢ 1 ec 
T = , (14) 


where n is the number of degrees of freedom in each estimate v,,- and 
ve" is the kth row, k’th column element in the inverse of the matrix 
linge. In general ¢ will be the number of degrees of freedom associated 


(13) 
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with the effect tested. If there is no effect, the statistic 7 in (14) will 
have a variance ratio distribution and can be referred to F tables with 
cand n — ¢ + 1 degrees of freedom (Rao [1952a]). 

The likelihood criterion is defined as 


sata | sum of products for error | 
| sum of products for error + sum of products for effect being tested | 


where the vertical bars denote determinants and “sum of products” 
refers to the matrix of centralized cross-products summed over the 
observations (Wilks [1932], Bartlett [1938]). Let 


m = degrees of freedom for effect tested + degrees of freedom 
for error, 


(15) 


q = degrees of freedom for effect tested, 
p = number of variates, 

A, td (p + q + 1)/2m, 

fi = pq, 

M = —mlog, A. 


Then (1 — A,)M can be referred to tables of x’ with f, degrees of 
freedom. An approximate F-test, which affords better approximations 
when m is small or p and gq are large is also available. If p or gq is 1 
or 2, an exact F-test exists. (See Box [1950], Rao [1952b)). This 
likelihood formulation is not valid if any two observations within the 
same vector have an expectation parameter in common. 


Multivariate Tests for Time Effect 


To obtain a test of the hypothesis of no time effect: y, = y2 = --: = 
vx = 0, one can eliminate the random individual component and 
obtain a set of linearly independent contrasts by choosing the Kth 
time as an arbitrary base and subtracting this observation from each 
of the remaining observations in time for a given subject. The test 
result is the same regardless of which time we choose for the arbitrary 
base. We are concerned with the analysis of 


= — Yair k=1,2,---,K —1i, (16) 
where 
E(d; jx) = — ve + bia — + — Jiix) (17) 


— t+ bu — Ox. 


The K — 1 contrasts which form a basis of time space are then 
estimated by 
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d.. = diss = You» (18) 
Using the conditions from equation (4), 
E(d.4) = — yn + — 9--n) = Ye — (19) 
a function only of the fixed time effects. Next we observe that 
Cov (d..4 , = (ones — Onn — + = On: (20) 


¢stimates of this covariance can be obtained in the following manner. 
Let 


(21) 
Then it follows under the above assumptions that 
= — Oe , (22) 
and 
Wiican] = (N — DNOw - (23) 


The elements of the variance-covariance matrix for the time contrasts 
d , are estimated by the quantity 


be = INN (24) 


with \ — J degrees of freedom. Furthermore, the set {d;;, — d;.s} 
is linearly independent of the set {d;.,}; hence any function of the 
latter is distributed independently of 6,,- , a function of the former. 
In particular, the contrasts d., are distributed independently of 6° , 
so we have the conditions for (14) withz, =d.,,ec=K—1,n=N-—TIZ, 


and vx, = 6 . Therefore we can test the hypothesis of no time 
effects by computing 

N-J-K+2 


(K-)W-1) 


and referring it to a Snedecor-Fisher variance-ratio F-table with K — 1 
and N — J — K + 2 degrees of freedom. In the example we get 
F, = 14.6 with (9, 33) d.f., a highly significant result (P < .001). 

The univariate test gave an I’-ratio of 60.95 with K — {| = 3 and 
(N — I)(K — 1) = 369 degrees of freedom. With the number of 
variates and number of treatment levels fixed, both tests are asymp- 
totically equal to F with (A — 1, ©) degrees of freedom as N, the 
total number of subjects, increases. 

‘The likelihood criterion cannot be used on the varie} ies 
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since y;;, and y,;,- have the parameters » and a; in common. ‘This 
difficulty can be avoided by applying the criterion to the variables 
{d;;,} defined in (16), considering yt = y. — yx and 6%, = b4 — dix 
as the parameters in the model. In this case the element in the kth 
row and k’th column of the numerator of (15) is 7. > Wisce.e’) and 
the corresponding element of the denominator of (15) is 


(diix din + jx dy x: + d x) 
= > k’) Nd. dy (26) 


NIN — 

Thus the likelihood criterion for time effect becomes 

A = |(N — |/| (N — + dia dx | 
dada’ 27 


K -1 
with m = N—1+1,q = 1,andp = K—1. For this case of g = J, 
the procedure is to form the test statistic 
— A)/A][(m — p)/p], (28) 
which in our application reduces to 
(N —I +1) 
K-1 


so that the 7 and likelihood procedure result in the same statistic 
when there is only one degree of freedom for effect tested. 


(29) 


Multivariate Tests for Treatment-Time Interaction. 


To obtain a set of linearly independent contrasts covering inter- 
action space, we may choose an additional arbitrary treatment base. 
For example, the (J — 1)(K — 1) means obtained by subtracting 
from each d;., the kth mean of the 7th dose group have no linear con- 
straints: 


= — = — — Yru - (30) 
Then 
= — — On + Ox (31) 
and 
Cov (in = Ore (32) 
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where 0,,- is defined in (20) and 


otherwise. 
If we denote Viz. = Nh. O.. and use (24), we obtain unbiased 


estimates of (52), if further we let 


then by the independence noted after (24) we have the conditions 
as set forth for (14) for the quadratic form 


F, = ((N — IK + — 1)(K — 1)(N — 


where V is the matrix with elements Vy and n = N — J,e = 
(1 — 1)(K — 1). While the V-matrix yields unbiased estimates, it 
dves not have a Wishart distribution so that /’, is not properly refer- 
rable to an F-table with [7 — i1)(K — 1), N — IK + Ky] degrees 
of freedom. If one inadvertently did so for the numerical example 
abave, he would note that /, = .387 with degrees of freedom 27 and 
15. which is a significantly low value at the 5 percent level. The 
explanation rests in the fact that #, may be represented as the sum 
of correlated ratios of sums of products, and the low value reflects 
this correlation. We proceed to derive the reduction of /’, to such 
an expression. 

The matrix // (h,,-) is a positive definite matrix and hence may 
be written as a product = CG’, so that = The 
vector u formed by defining 


where g*” is the ith row, mth column clement of G~', is then a set of 
(J — 1) groups of (K — 1) variables each, each group having covariance 
matrix 6, and cach group being independent of each other group since 


Ox. When i=7', 


= Oe (Dg = (37) 
0 otherwise. 
By the transformation (36), ’, may be written as 
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where u, is the vector (u;, , , independent of u,- and 
having covariance 6. Each of the terms 


= (39) 


has, with appropriate constant factor, an F-distribution with (K — 1, 
N — I — K + 2) degrees of freedom but the 7’; are correlated through 
the common element 6. Hence the :significantly low value of F; . 
Note however that each 7; is analogous to a ratio of independent 
sums of squares, and that >~ 7’, corresponds to the addition of several 
ratios having a common denominator and having mutually independent 
numerators. In the univariate case the ratios and the numerator degrees 
of freedom are additive so that by analogy we may refer 
N-I-K+2 


F; 


to F-tables with (J — 1)(K — 1) and N — I — K + 2 degrees of 
freedom. In the example F; = .851 with 27 and 33 degrees of freedom, 
a nonsignificant result. The authors would welcome comments re- 
lative to the distribution of the quantity in (40). 

As noted previously the likelihood criterion may be applied to the 
variables {d,;,}, the ratio (15) taking the form 


+ — — | 


withn 
For the numerical example, A = .5513 and thus (1 — A,)M = —37.5 
log, .5513 = 22.33 which as x” with (J — 1)(K — 1) = 27 df. implies 
that the interaction effects are not significant. 

These multivariate tests are to be compared with the univariate 
nonsignificant result: F = 1.36 with (J — 1)(K — 1) = 27 and (N — 1) 
(K — 1) = 369 degrees of freedom. As N becomes large, all the tests 
for interaction asymptotically depend on an F-distribution with 
{JZ — 1)(K — 1), ©] degrees of freedom. 

Serial Correlation 

In addition to apparently unequal variances (see Table 5) there 
also seem to be present unequal correlations between the various 
time. For certain biochemical determinations made on a living organ- 
ism, one might explain this serial correlation as resulting from a naturally 
occurring periodicity: e.g., blood sugar determinations should reflect 
this phenomenon of higher correlations between measurements mde 
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closer together in time than for observations further removed in time. 

To study the serial correlation one can modify the assumptions 
in (10) to allow for unequal correlations between the various times 
but still assume the variances to be equal; that is, assume the random 
subject component b;; to be independent of the residual g;;, so that 


Cov ’ = 0, (42) 


and further assume the time correlation to depend only on the lag 
|k — k’ | so that 


k=k’, 


ll 
~ 


2 2 
o, +o, , 


Il 


(‘ov = a, Pik 1 = ae k (43) 
otherwise. 


Using (6) and under the assumptions of (43), calculate 


15} --—-t- 
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FIGURE 2 
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TABLE A 
AveracE Score (4 Trrauts/Dar) ror Eacu Supsect 
on PsycHomotor TEstTinG DEvICcE 

Time (Days Post Irradiation) \ 
Pre 1 2 3 4 5 6 7 8 9 10 

Controls 
; 1 | 191 223 242 248 266 274 272 279 286 287 286 
2 64 72: 81 66 92 4114 126 123 134 148 140 
ia 3 | 206 172 214 239 265 265 262 274 258 288 289 
; 4 | 155 171 191 203 219 237 237 220 252 260 245 
; 5 85 138 204 213 224 247 246 259 255 374 284 
6 15 22 24 2 38 41 46 62 62 #79 += «74 

i 25-50r 

7 53 53 102 104 105 125 122 #150 93 127 132 
8 33 45 50 54 44 47 «45 «61 SO 60. 8&2 
9 16 47 45 34 #37 «2461 28 438 «40 45 
At 10 | 121 167 188 209 224 229 230 269 264 249 268 
11 | 179 193 206 210 221 234 224 255 246 225 229 
12 | 114 91 154 152 155 174 196 207 208 229 173 
13 92 115 133 136 148 159 146 180 148 168 169 
14 84 32 97 47 «87 103 124 110 162 187 
15 30 38 37 40 48 61 64 #6 8 91 += 90 
16 51 66 131 148 181 172 «#4195 170 158 215 
17 | 188 210 221 251 256 268 260 281 286 290 296 
18 | 137 167 172 212 168 213 190 196 211 213 224 
19 | 108 23 18 30 2 40 57 «37 «656 
‘ 20 | 205 234 260 269 274 282 282 290 298 304 308 

75-100r 
21 | 181 206 199 237 219 237 232 251 247 254 250 
22 | 178 208 222 237 255 253 254 276 254 267 275 
23 | 190 224 224 261 249 291 293 204 295 299 305 
: 24 | 127 119 149 196 203 211 207 241 220 188 219 
25 94 144 169 164 182 189 188 164 181 142 152 
26 | 148 170 202 181 184 186 207 184 195 168 163 
27 99 93 122 145 180 167 153 165 144 156 167 
28 | 207 237 243 281 273 281 279 +294 307 305 305 
29 | 188 208 235 249 265 271 263 272 285 283 290 
30 | 140 187 199 205 231 227 228 246 245 263 262 
31 | 109 95 102 96 135 #4135 «#4L1l 146 131 162 171 
32 69 46 67 2 438 #55 #55 #77 #=+7% 7 £76 
33 69 95 137 99 95 108 129 134 133 #4131~= 91 
34 51 59 76 101 72 #72 «#4107 «128 120) 133 
35 | 156 186 198 201 205 210 217 217 219 223 229 
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TABLE. A (Continurp) 


Time (Days Post Irradiation) 
Pre 1 2 3 4 5 6 7 8 9 


125-250r 

36 201 202 229 232 224 237 217 268 244 « 275 
38 86 54 75 75 
39 115 158 168 175 188 164 184 195 194 206 
40 183 175 217 235 241 251 229 241 233 233 
41 206) 215. 197 207 226 244 
42 710 105107 92 101 103 78 87 57 70 
43 172 213) 276) 273) 267) 286 283) 290 
44 224 258 248 257 257 267 260 279 299 289 
M5 216 257) 291 3306S 295s 312 


red. =0 
\ 
0 


LAG» 


FIGURE 3 
Lag Corre ation Estimates CovartANcES BerwEEn TIMES 
Correcrep For Group AND SuBsEcT Errecrs, ron Two VALUES OF po. 
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Then 

te = (45) 
yields an estimate of (of + p,03)/(o3 + 02). These estimates from 
(45) are plotted in Figure 2. The correlation remains high owing to 
the large component o} . To get a more direct appraisal of the lag 


correlations p, , one can remove the average-person component by 
defining 


Titik = — (46) 
then correct for group effects and pool covariances by computing 


Cu = (N — 1)" — — (47) 


Equating these C,,- to their theoretical values as deduced from assump- 
tions (43) yields a set of equations in o} , p, , --* , px-, having one 
degree of indeterminacy. These may be reduced to express each esti- 
mate in terms of the estimate r, of p, ; for the data of the example 
the expressions are 


r, = .756 + .244r,, = + .497r,, 
r2 = .700 + .300r,, re = + .583r,, 
rs = .648 + .352r,, r, = 395 + .605r, , 
r, = .580 + .420r, , rz, = .093 + .907r,, 
» _ 736.6 


(48) 


Note that the estimates decrease as lag increases for any value of 
rg less than one. The relationships are graphically portrayed in Figure 3 
for the two cases rp = O andr, = .9r,. 


5. SUMMARY 


An example is given involving repeated measurements on the same 
individuals over time. The assumptions for and techniques of the 
usual univariate analysis of variance procedure for such a set-up are 
given. The assumption of equal variances and covariances, the so- 
called symmetry assumption, is tested and it is concluded that this 
assumption is not fulfilled for these data. The multivariate procedures 
are then given for the situations where a valid univariate analysis is 
not justified. It is noted that asymptotically, the univariate and multi- 
variate tests are identical. The conclusions are that for this example, 
essentially the same inferences are made from the univariate and multi- 
variate analyses. 
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INTRA- AND INTER-BLOCK ANALYSIS FOR FACTORIALS 
IN INCOMPLETE BLOCK DESIGNS’ 


R. A. BRADLEY’, R. E. anp C. Y. KRAMER 
Virginia Agricultural Experiment Station 
of the Virginia Polytechnic Institute 
Blacksburg, Virginia, U.S. A. 


1, INTRODUCTION AND SUMMARY 


This paper deals with extensions of the use of factorial treatment 
combinations in classes of incomplete block designs. We review the 
status of such usage and then note the situations considered in this paper. 

Factorial treatment combinations have frequently been used in 
experimental designs in applications in both agriculture and industry. 
Blocking through confounding patterns has produced some of the effects 
of incomplete block designs. But factorial treatment combinations 
may be used within more conventional incomplete block designs. 
Cornish [1938] considered factorials in balanced incomplete block de- 
signs. Harshbarger [1954] used a 2° factorial in a latinized rectangular 
lattice design. More recently Kramer and Bradley [1957a, 1957b] 
and Zelen [1958] used factorials in Group Divisible (GD), partially 
balanced incomplete block (p.b.i.b.) designs. Confounding in factorials 
has formed the conceptual basis for the generations of classes of in- 
complete block designs. The works of Nair and Rao [1948] and Nair 
[1953] are fundamental, and they used confounding in asymmetrical 
factorials to generate GD p.b.i.b. designs as a special case of their 
system. We are interested in the direct use of factorials in incomplete 
block designs. 

It is the main objective of this paper to extend the recent work of 
Kramer and Bradley to the use of factorials in the several suitable 
classes of two-associate class, p.b.i.b. designs not previously con- 
sidered. A second objective is to complete the earlier work through 
utilization of inter-block information in gombined intra- and inter- 
block estimators. We consider specifically recovery of inter-block 
information in GD designs, both intra- and combined intra- and inter- 


1Research sponsored by the Statistics Branch, Office of Naval Research, U. S. Navy. Repro- 
duction in whole or in part is permitted for any purpose of the United States Government. 
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block analyses for Latin-Square, Sub-type L, (LS,) designs and Latin- 
Square, Sub-type L; (LS;) designs and discuss other p.b.i.b. designs 
with two associate classes. In each situation tests of significance and 
estimators of factorial effects are obtained along with efficiencies of 
contrasts. 

Only basic results are given in this paper. We work essentially 
from treatment estimators for the basic designs and show how they 
yield the necessary information for the analyses for factorials. Usual 
concepts of experimental design and factorials apply and attention in 
application must be given to choice and blocking of experimental units, 
applicability of models, estimation of weights in the recovery of inter- 
block information, and presentation of results including tables and 
figures supplementary to basic analysis of variance tables. Methods of 
analysis follow usual patterns and numerical examples are not in- 
cluded. 

This paper essentially completes consideration of the use of factorials 
in classes of designs catalogued by Bose, Clatworthy, and Shrikhande 
[1954]. Factorial treatment combinations may be used in the Singular, 
Semi-Regular, and Regular subclasses of GD designs, in the LS, designs, 
and in the LS, designs. The list of Simple p.b.i.b. designs may contain 
some designs suitable for use of factorials but it appears that each design 
may have to be given individual consideration and this has not been 
done. The Triangular designs in general provide undesirable con- 
founding patterns in use with factorials and the Cyclic designs have 
treatment numbers that are primes. 


2. P.B.I.B. DESIGNS WITH TWO ASSOCIATE CLASSES 


‘ 
We summarize definitions, notations, and theory for the analysis 
of p.b.i.b. designs with two associate classes in this section. ‘Che sum- 
mary is largely based on material given by Bose, Clatworthy, and 
Shrikhande. 
Common to standard design notation, we use 


v: The number of treatments (or treatment combinations), 
r: The number of blocks containing any given treatment, 
b: The number of blocks, and 

k: The number of experimental units in each block. 


In regard to the association scheme, 


(i) Two treatments are either first or second associates, 
(ii) Two treatments which are 7th associates occur together in 
exactly A, blocks, 7 = 1, 2, 
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(iii) Each treatment has exactly n, ith associates, 1 = 1, 2, 
mn, +n, =v—I1,and 

(iv) pj, is the number of treatments common to the jth associate 
of the first and the kth associate of the second of a pair of 
treatments that are ith associates, 7, j,k = 1, 2; pi, = Di; - 


Constants A, H, ¢, , and c, are catalogued with each design. They 
are related to the design parameters defined above as follows: 


PA=(a+ rs) +0. gs). 


= (2a + dr, + Aa) + (f — — Ad). (2) 
k Ac, = + Az) + (Ar — Aa)(PA2 — (3) 
k Ace = + Ay) + (Ar — Aa)(fA2 — gr.)- (4) 
In (1, --- , 4), 
a=rk—-1), f=pi2, and g= pir. (5) 


Suppose that » = mn and we do this in anticipation of the intro- 


duction of factorial treatment combinations. Then the model, suitable - 


for the recovery of inter-block information, is that an observation 
= Ut +B. + (6) 


where = 1, ---,m,j = ands = 1, --- , b and if treatment 
(ij) occurs in block s. In (6), y;;, is expressed in terms of the grand 
mean uy, the fixed (Model 1) treatment effect 7;; , the inter-block error 
8, (random with zero expectation and variance o;), and the intra-block 
error ¢;;, (random with zero expectation and variance a”). The usual 
restriction >>; >>; 7; = 0 applies as do usual assumptions of inde- 
pendence and normality although the latter of these does not appear 
to be very restrictive in view of the known robustness of analysis of 
variance. 

Analysis of variance and treatment estimation follow the system 
given by Bose et al. Combined intra- and inter-block estimators of 
treatment effects are 


i; = [(k — + — — Dr’. (7) 
In (7) we have the following definitions: 
—1)}], W= 1/0’, = 1/0’ + koi), 
i=1,2, 
Ps =WQ,+ , 
Q., = Ti; — and Qi; = (B.;./k) — (G/bk). 


(8) 
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The (zj)th treatment total is 7,; , B;;. is the total of block totals for 
blocks containing treatment (7j), and G is the grand total of all observa- 
tions. S,(P;,) is the sum of the P;; for all first associates of treatment 
(tj). The weights W and W’ have been assumed known; in practice 
estimators w and w’ of these weights are obtained as described in the 
reference. Intra-block estimators may be obtained from (7) by taking 
W’ = 0 throughout. 

Following the estimation process, we are interested in the variances 
and covariances of treatment contrasts. Bose and co-authors give the 
result that 


Vitis — tii) = 2h — d,)/r'(k — 1) (9) 


where y = 1 or 2 according as treatments (77) and (7’j’) are first or 
second associates. From (9) and, since V + z ; ti;) = 0, because 
Dd: do: ts; = 0, it is easy to show that 


V(tl;) = [nk — + nk — d2)]/or'(k — 1) (10) 
and 


Cov (ti; , = [n(d, ds) 31) (1) 


where (v, 6) = (1, 2) or (2, 1) depending on whether treatments {7/) 
and (7’j’) are first or second associates respectively. These results 
(10) and (11) are fundamental to consideration of factorials since 
factorial effects estimators are linear combinations of the ¢/, . 

With the recovery of inter-block information in combined treatment 
estimators, main interest centers on the variances of treatment contrasts 
(and on the variances of factorial contrasts). However, Rao [1947] 
has shown that 


? 


= - - Ww) D 2) 


—r»(W — W’) 


has the x’-distribution with (v — 1) degrees of freedom. This depends 
on the weights W and W’ being known without error but is approximate 
when they are replaced by w and w’ (obtained as estimators as given 
by Bose, Clatworthy, and Shrikhande[1954]). Again, if we set W’ = Oin 
(12) and estimate 1/W by the intra-block error mean square, x7/(v — 1) 
reduces to the usual F-statistic, the ratio of adjusted treatment mean 
square to error mean square. Partitioning of x% for factorials will 
be shown. 
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3. FACTORIALS IN P.B.I.B. DESIGNS WITH TWO 
ASSOCIATE CLASSES 


Factorial treatment combinations are introduced into p.b.i.b. designs 
with two associate classes through consideration of a basic two-factor 
factorial with factors A and C at m and n levels respectively. The 
(tj)th treatment is now the treatment combination A,C;. The model 
(6) is reparameterized by writing _ 


= ag ty; + (13) 


where a, = 0, = 0, and 6; = Do; = 0. The effect 
of A; is a; , of C; is 7; , and 4,; is the interaction effect of A; and C; . 
The A- and C-factors may themselves be made up of factorial treat- 
ment combinations so that we are not in any way limited to two-factor 
factorials. 

However the factorial treatment combinations are assigned to the 
basic design treatments (and this has not yet been specified), it follows 
from (7) and (13) that estimators of factorial effects based on combined 
intra- and inter-block information are 


1 
Dhak, 


(14) 
and 


with latin letters used as estimators for effects designated by corre- 
sponding Greek letters. Thus factorial estimators are easily obtained 
from a two-way table of usual treatment estimators t/, . 

Variances and covariances of the factorial estimators in (14) may 
always be worked out on the basis of those of the ¢/; in (10) and (11). 
But the objective is to assign the factorial treatment combinations 
to the basic design treatments in such a way that these variances and 
covariances are as simple as possible. For GD, LS, , and LS, p.b.i.b. 
designs, it is possible to assign the factorial treatment combinations 
so that the following properties hold: 

(i) A-factor contrasts have equal variances and are independent 
of each other and of other factorial effects if orthogonal A-factor con- 
trasts are specified in the usual way for factorials. 

(ii) C-factor contrasts have equal variances and are independent 
of each other and of other factorial effects as in (i). 
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(iii) AC-interaction contrasts have equal variances and are inde- 
pendent of each other and of other factorial effects except for the LS, 
designs for which some simple but special discussion is necessary. 

(iv) xz may be partitioned, xt = xf + xé + , to yield 
values for A-factor, C-factor, and AC-interaction effects with degrees 
of freedom (m — 1), (n — 1), and (m — 1)(n — 1) and such that these 
quantities are independent. 

(v) xi, xé, and may themselves be into usual 
one degree of freedom contrasts except that care need be taken with 
xc for the LS, designs. 

For the classes of designs considered, we find 


= 2 (15) 
and 
= /Ke (16) 
In addition, for GD and LS, designs, 
Xac = (di;)’/Kac (17) 


but a slightly more complicated form is required for LS, designs. 
In the analysis of incomplete block designs we are interested primarily 
in variances of treatment and factorial contrasts. Here we have 


— ai.) =2K,, (18) 
and 
Vici — ch.) = 2Ke, (19) 
Also, for GD and LS, designs, the latter with m = n, 
= (m — , (20) 
Cov di; ,di;) = (21) 
Cov (di; , di.;) = —(m — I)Kac (22) 
and 
Cov di; =Kac, (23) 


LS, designs require special discussion in regard to interaction variances 
and covariances. 

For factorials in complete blocks or in completely randomized 
designs, single degree of freedom contrasts are often used. This can 
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still be done for factorials in p.b.i.b. designs. Consider the A-factor 
in complete blocks or in completely randomized designs with estimators 
d; of a, . A contrast is of the form >; é,4; with }, & = 0. The 
usual mean square with one degree of freedom is 


NAD (24) 
and the corresponding x’ is | 
NAD (25) 


where NV, is the number of observations in the averages d, and ¢” is 
the true error variance. For the p.b.i.b. designs, K, corresponds to 


o’/N, and the procedure differs from the usual one only in this respect. 
Now 


with one degree of freedom. In practice in both situations K, and 
’/N, are estimated, the former by substitution of estimated weights 
w and w’ for W and W’ and the latter by s*/N, where 8° is the error 
mean square. 


For a C-factor contrast, >-; n;ci , >.; 7; = 0, the same procedure 
applies and 


x= nici)’ /Ke (27) 


with one degree of freedom. Useful interaction contrasts are usually 
of the form with corresponding 


x’ (2 Le "i (28) 


with one degree of freedom. For GD, LS, , and LS, designs, (26) 
and (27) are applicable, (28) applies only for GD and LS, designs 
and again special attention is necessary for LS, designs. These formulas 
permit introduction of multi-factor factorials, consideration of trends 
over factor levels, and evaluations of all necessary interactions. Ad- 
ditional details are not necessary since such contrasts with factorials 
are in general use. 

The efficiencies of the factorial contrasts in the GD designs may 
be obtained in comparison with their uses in other designs. Efficiencies 
in comparison with randomized block designs based only on the intra- 
block analysis were given earlier by Kramer and Bradley. Here we 
consider efficiencies using the combined estimators and, since some of 
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the GD designs have blocks that may be grouped into replications 
and some do not, we obtain efficiencies in comparison with completely 
randomized designs. I’or the completely randomized designs, we com- 
pute expected variances and covariances corresponding to (18, --- , 23), 
expected in the sense that these variances and covariances depend on 
the assignments of treatment combinations to experimental units and | 
hence average or expected values must be obtained. It is assumed 
that the experimental units of the incomplete block designs are used 
with k experimental units containing the component £, , k containing 
8, , and so on. The factorial treatment combinations are assigned 
randomly to the totality of rv experimental units. Under these con- 
ditions, if a, , ec; , and d,;; are the appropriate factorial-effects estimators 
for the completely randomized éesign, we have 


[V(a, — [V(e; — é,-)]/2n = — Ia — 1) 
—[Cov (d,; , d;-)|/(m — 1) = —[Cov — 1) (29) 


II 


where 


K = — 1)W + — — 1) WW’. (30) 


The appropriate efficiencies, based on comparisons of variances and 


covariances in (29, 30) with corresponding ones in (18, --- , 23), are 
then 

E, = mK/K, (31) 
and 

Ec = nK/Ke (32) 
for GD, LS, , and LS, designs and 

= K/Kac (33) 


for GD and LS, designs with special results needed on E4¢ for LS; 
designs. These efficiencies are not less than unity (unity when W = W’) 
and also apply to corresponding single degree of freedom contrasts 
as discussed above. 

In this section, we have given the basic results for fact: .iais in 
p.b.i.b. designs of the classes noted. Tor each of the classes of designs 
it remains to define the assoviation of factorial treatment combinations 
with basic design treatments and to develop results for K, , Ke , and 
Kac . Some additional discussions will be given for the LS, designs 
on points of exception noted above. Estimation of weights will not 
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be discussed as this is fully covered by Bose, Clatworthy, and Shrik- 
hande for each class of designs. 


4. SPECIAL CONSIDERATIONS FOR GD, LS:, AND LS; DESIGNS 
GD Designs 

The intra-block analysis for factorials in GD designs has been 
discussed by Kramer and Bradley [1957a, 1957b] and by Zelen [1958] 
with the latter also showing inter-block estimation. These analyses 
were perhaps also anticipated by Nair and Rao [1948] as they used 
asymmetrical factorials in design construction. We complete the work 
here with our notes on the use of combined intra- and inter-block 
analyses. 

GD designs depend on a rectangular m X n association matrix, 
v = mn such that each treatment has (nm — 1) first associates, those 
treatments in the same row of the matrix, and n(m — 1) second associ- 
ates, those treatments not in the same row of the matrix. The basic 
two-factor factorial set of treatment combinations is associated with 
the design treatments by letting the rows of the association matrix 
correspond to the m levels of the A-factor and the columns of the matrix 
with the n levels of the C-factor. 

Quantities required for the GD designs are 


K, = + n(rk — vd.) W’) (34) 
and 


mKe = Kac = k/[((rk —r + ,)W + (r — (35) 


It may be useful to note that S,(P;;) = = P:. — Pi; , where 
= >; P.; , (P.; = and that d, and d, in (8) may be ex- 
pressed in the forms, 


d, ACW W')Kac (36) 


and 


d, = n(W — — [(n — DAA, — — W')Kac/k}}, (37) 


perhaps more directly related to basic design parameters. Substitution 
of these values in preceding formulas is sufficient to permit analyses 
of GD designs. 

Results given by, Kramer and Bradley [1957a] for the intra-block 
analysis of GD designs follow as special cases of those given here when 
W’ is set equal to zero. (If sums of squares instead of x”’s are desired, 
also take W = 1). 
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ad 
LS, Designs 


LS, designs are p b.i.b. designs with two associate classes but they 
are non-group-divisible. The association scheme is a square array 
with »® treatments such that two treatments are first associates if 
they occur in the same row or the same column of the array and are 
second associates otherwise. The general results of Section 3 apply 
with m replaced by n. We associate basie A- and C-factors with rows 
and columns respectively of the square association matrix. 

The analysis for factorials in 8S, designs follows the basic pattern 
of the preceding section. We have found 


K, Ke = k/n{{nd, + n(n — 12)d.}(W W’) + rkW’) (38) 
anid 


K | W’) + rkW’’). (39) 
ow 2P,, and 
(i n/k)r, A») 
mid 
W')K 
— 2)(\, — — — W)Kac/k} 
that have association 
cependent ou e properties of the latin squan rh 
are re associated with n treatments, treatments. 
ww, in the same column, or on the same letter of the square 


associates and are second associates otherwise. 


wo possible schemes of associating basic 
considered, The first scherne that will be discussed in detail ts 
that used for the LS, designs 


A- and C-factors may 


and involves associating factors with 

ws and columns of the square. The second scheme assumes that 
the latin-square of the association scheme is one of a set ot at least 
three orthogonal latin square Then letters in the two additional 


satires are used to indicate factor levels for A- and C-factors. ‘TI 
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second scheme requires the production of LS, designs other than those 
catalogued and this is a disadvantage. Whichever scheme is used 
leads to difficulties with interaction contrasts. 

We now examine the first scheme and state that we have found 


K, = Ke = k/n{{2nd. + nln — — W) + rkW']. (42) 
We can also show 


where 


Kac = k/[{3nd, + n(n — 3)A.}(W — W’) + (44) 


but emphasize that this value of K4c may not be used in (17) but is 
defined only for reduction of necessary formulas. In (43) >~, indicates 
a sum over the n letters, A, B, C, --- , of the latin square and d{(tj) 
is the average of the d/, over the generic letter L. x%¢ in (43) does 
have the x-distribution with (n — 1)’ degrees of freedom and in fact 
a linear transformation on the d{; is possible to permit expression of 
xc 88 & sum of squares but this is not helpful in the analysis of the 
factorial. 

To consider interaction contrasts in the analysis of factorials in 
LS, designs, it seems necessary to work out variances and covariances 
of such contrasts for each such design used. The following quantities 
are helpful: 


n 


C, = Cov (di; , dj.;) = Cov di; , di; +) 


(45) 
Cp = Cov [di,(L), di-;(L’)] 


Ci = Cov [di,(L), 


The covariance Cp is between d/,’s associated with treatment com- 
binations in different rows, columns, and on different letters of the 
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association square; the covariance C, is between d/,’s associated with 
the same letter of the association squares and necessarily different 
rows and columns. 

Consider an interaction contrast, 


= Dd wa di, (46) 


with = uy; = 0. (We may have the form as 
discussed in (28) if desired). Then 


where the Jast summation in (47) is the sum of all of the coefficients 
associated with the letter L of the latin square with the excepiion of 
u,, itself. We have used u,;(L) to indicate that u,; is associated with 
the letter L. The summation in (47), while difficult to write down 
carefully, is conceptually clear. It follows that 


xi = VT (48) 
has the x*-distribution with 1 degree of freedom. 
The difficulty is that all of the possible, and usually independent, 
(n — 1)’ x’’s like that of (48) are no longer independent. Lack of 
independence may be considered by examining covariances, many of 
which, but not all, are still zero. The general formula for such co- 
variances, say between /{u) and J(u’) is 
Cov [I(u), = (Cx — Cr) (49) 
‘ i 


with the summation system being the same as in the right-hand member 
of (47). 


Efficiencies of interaction contrasts may be worked out by com- 
paring (47) with the corresponding variance obtainable from (29) 
and (30) with m = n. These efficiencies are L',¢(1) for some interaction 
contrasts and £,:(2) for others while others yet are bounded by these 
values. We have 


E, = Ee = Enac(l) = nK/Ka (50) 
and 
E4c(2) = K/Kac . (51) 


E,4c(1) is the efficiency for interaction contrasts among averages of 


ise 
3 
: 
ie 
‘Es 
j 
a 
| 


578 BIOMETRICS, DECEMBER 1960 


the d{; over letters of the latin square and E,¢(2) is the efficiency for 
interaction contrasts among the d/, associated with the same letter 
of the latin square. Actually xi, may be partitioned into two ’’s 
on this basis but this is not directly useful. We refer to the example 
below to see how these efficiencies work out. 

General results that we have obtained for LS, designs are complete 
when we note that 


S\(Pi;) = Ps. + P.; + — (52) 


where P,,(zj) is the total of values of P;,; occurring on the letter associ- 
ated with the specified P,; in S,(P,;). For convenience note that 


d, = n(W — W’)K,z 
{(n 2)(A, 2)(3A, 2r.)(W W)Kac/k}] (53) 


and 
dy = (W — — 3(n/k)(A\, — 


— 3). — — 2).}(W- 


where K, and Ky¢ are in (42) and (44) respectively. 

The use of factorials with LS, designs cannot be strongly recom- 
mended for multi-factor factorials if other suitable p.b.i.b. designs 
exist because of the additional complexities in partitioning the inter- 
action chi-square. The partitioning of x4, for multi-factor factorial+ 
in LS, designs could be catalogued and, with a listing of possible par- 
titionings for each LS, design, difficulties in the use of these designs 
would be largely removed. 


An LS; Example 


We have considered Design LS14 of the cited catalog of designs in 
detail. This design hasv = 16,r = 3, k = 3,b = 16,A, = 0,A, = 1, 
and n = 4, It is suitable for a 2* or 2” 2” factorial. The basic A- 
factor may be considered as a 2’ factorial with two-level factors P, 
and Q; the basic C-factor may be made up from two-level factors R 
and S. The rows of the association square were given factor combina- 
tions , PoQ: , P:Qo , and the columns , RoS, , R,So , 
R,S, where the zero subscript indicated the lower level of the factor 
and the unit subscript the higher level. It was found that all intcraction 
contrasts were uncorrelated with the exceptions of the P#-interaction 
with the PQRS-interaction and the PRS-interaction with the PQR- 
interaction. Other interactions were estimated with efficiencies E4¢(1) 
or E'4¢(2) (see the discussion of efficiencies above) and indeed the sum 
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and difference contrasts of two correlated interaction contrasts in each 
of the two pairs of correlated contrasts are independent and estimated 
with efficiencies E,¢(1) or E4c(2). 


Note 


The specific results given in this section have been organized to 
fit into the system set forth in Section 3. The results were not ob- 
tained by a general mathematical approach, although this may be 
possible, but rather through repeated use of (10) and (11) in association 
with (14). The general treatment of Section 3 was therefore a scheme 
for summarizing the results obtained and for presenting a unified 
approach to the use of factorials in GD, LS, , and LS, designs. At 
first reading the results obtained for factorials may seem complex; 
this is not really so and, after the basic analysis for the p.b.i.b. designs 
leading to estimators t/; of treatment effects, except possibly for the 
LS, designs, use of factorials is immediately possible with the calcu- 
lation of K, , Ke , and Kac. 

In concluding this section, we wish to reemphasize, at the risk of 
being repetitive, that formulas given are for combined intra- and inter- 
block analyses. Weights W and W’ are estimated in the usual way 
and estimates then replace W and W’ in formulas in application. Then 
variances, covariances, and efficiencies are estimated and  ’*-values 
have only approximately the y*-distributions indicated. In some 
situations recovery of inter-block information is not appropriate and 
only intra-block analysis should be used. Formulas given are reduced 


to intra-block formulas when we set W’ = 0 and x”’s reduce to sums 
of squares if we also set J!" = 1! as has been stated above. Results 
obtained may be sumimarized into analysis of variance or analysis of 


x’ tables and, in presenting numerical analyses, this seems appropriate. 
5. FACTORIALS IN OTHER P.B.1I.B. DESIGNS 


Bose, Clatworthy and Shirkhande, in addition to classes of p.b.i.b. 
designs already considered, discuss two-associate class designs desig- 
nated as Simple, Triangular and Cyclic. 

The Simple designs are those with A, ¥ 0, A, = 0, or A, = 0, A, ¥ O. 
Some such designs fall into classes already discussed but those listed 
under the class do not. For Simple designs, association schemes are 
given by the designs themselves and a unified association system is 
not available. Since there is no standard association pattern, it appears 
that each design requires individual consideration if factorials are 
introduced. Some of the designs would be useful for factorials but 
many have treatment numbers that are not suitable. 
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The Cyclic designs have treatment numbers that are primes and 
consequently cannot be used with factorials. 

The Triangular designs might be useful in association with factorial 
treatment combinations. We have not, however, developed a suitable 
method of applying the factor levels to the treatments in these designs 
and further discussion is excluded here. 

Brenna [1958] has discussed the use of factorials in certain lattice 
designs. These p.b.i.b. designs usually have more than two associate 
classes and his work represents an extension of the use of factorials 
in incomplete block designs beyond the scope of this paper. 


6. DISCUSSION 


The material in this paper greatly increases the numbers of incom- 
plete block designs available for use with factorial treatment combina- 
tions. Now GD, LS, , and LS, p.b.i.b. designs may be used. Results 
for balanced incomplete block designs have been given by Cornish 
[1938] and may also be obtained by setting A, = A, in formulas presented 
here for GD designs. The work of Brenna adds additional p.b.i.b. 
designs that may be used. 

In industrial applications, blocks will oft 2n be determined by time 
periods, numbers of machine positions, batch size and the like (in some 
of which cases only intra-block analyses should be used). Factoriais 
are often used in industrial experiments. Sometimes in such experi- 
ments little replication is used but sometimes replication is necessary. 
Often fractional factorials are used and they may still be used in just 
the same ways in the incomplete block designs. In our example, the 
four levels of the basic A-factor could have been associated with four 
treatment combinations selected as } of a 2° factorial while the four 
levels of the basic C-factor could have been a } of a 2‘ factorial. The 
whole experiment would then have been a } fraction of a 2” factorial. 
Some preliminary use of factorials in incomplete block designs has been 
made by Hill and Wheeler [1958] and many other applications may 
be visualized. 

In agricultural applications, the use of factorials in incomplete block 
designs should be widespread. In animal work, litter sizes often require 
the use of incomplete blocks and, as an example, factorials may be 
made up of ration factors in feeding trials. Fertilizer trials in agronomic 
research are often set up as factorials; the use of incomplete block 
designs in such situations should increase efficiencies of such experi- 
ments. 
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THE DETECTION OF HOST VARIABILITY IN A DILUTION 
SERIES WITH SINGLE OBSERVATIONS 


P. ARMITAGE AND G. E. Bartscu* 


Statistical Research Unit of the Medical Research Council, 
London School of Hygiene and Tropical Medicine, London, England. 


1. Introduction 


The “independent action” theory of infective systems has been 
reviewed by Meynell [1957]. According to this theory, if a group of 
particles (say bacteria or viruses) is inoculated into a host organism 
(which may be the whole or part of an animal or plant), a detectable 
infection can be caused by the successful multiplication of one organism. 
The particles act independently, and a randomly chosen particle has 
a probability p of initiating an infection in a particular host. If p is 
the same for all hosts, and each host receives an inoculum containing 
\ particles (or rather a Poisson variate with a mean \), the proportion 
of uninfected hosts is 


P=e™”™. (1) 


If the hosts vary in susceptibility, so that p varies from host to 
to host with a cumulative distribution function F(p), the proportion 
of uninfected hosts is 


The effect of variability in p is, broadly speaking, to flatten the slope 
of the response curve relating the proportion of uninfected hosts (or 
some transform of this proportion) to the dose A (or its logarithm). 

A set of observed quantal response data will typically consist of 
the numbers of infected and uninfected hosts out of a total of n; at each 
of a series of doses A; . A number of tests have been proposed for the 
hypothesis that the response curve follows the exponential curve (1), 
with the aim of detecting the sort of flattening implied by (2); see, for 
example, Moran [1954a, b, 1958], Armitage and Spicer [1956], Stevens 
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[1958], Armitage [1959a]. Of these test criteria, the 7'-test of Moran 
[1954a, b] and the'two equivalent statistics of Moran [1958] and Stevens 
[1958] are very simple to calculate, while that of Armitage [1959a] is 
asymptotically efficient. 
We consider in this paper the problem of testing the hypothesis of 
constant p for the particular case in which n; = 1 and the doses follow 
a 2-fold dilution series. It is true that a single dilution series with one 
observation at each dose is unlikely to be much used in experimental 
work, because the precision of an estimated endpoint (say, the ED 50) 
will be very low. There are, however, experiments in which the total 
number of replicates at each dose is sufficiently high to ensure a reason- 
ably precise endpvint, but in which the experimental design involves 
blocking in such a way that each block provides a dilution series of 
single observations. One set of experimental data of this type is shown 
in Table 1. Each row of Table 1 provides a dilution series from a single 
side of a rabbit (the side of a rabbit being the “block” in this example). 
To test for variability in p between the host units within a block it 
is necessary to consider the problem of single observations. Other 
experiments providing data of this type are described by Finter and 
Armitage [1957, Table 1]. 
Of the simple statistics hitherto proposed for the general case, 
Moran’s 7’-statistic is not applicable, since for n = 1 it always takes 
the value zero. The Stevens-Moran range statistic, R, can be recorded 
at a glance, and its null distribution for two-, four- and ten-fold dilutions 
is known (Stevens, [1958]). There is, however, an equally simple statistic, 
denoted below by J, which, there are reasons for believing vields tests 
more powerful than on F. One hesitates to add yet anotuer test to 
the list of candidates already in the field, but it seemed worth while 
to make some comparison of the relative merits of J and R. 


2. Various test statistics 


Steven’s [1958] range statistic 2 is defined as the number of dilutions 
between (and including) the first at which not all observations are 
positive (i.e. infection is observed) and the last at. which not all are 
negative. Moran’s [1958] statistic D is equal to R — 1. Thus, in the 
following series of single observations: 


++ +04+0++400-:-, 


the value of R is 5. 
The new statistic J is defined, for a series of single observations, as 
the number of positive results at dilutions beyond that at which the 
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first negative occurs. Thus, in the series given above, J = 3. Note 
that J is asymmetric. One could define a complementary statistic, 
J’, as the number of negative results at dilutions before the last positive 
occurs. In our example, J’ = 2, and in general J + J’ = R. For 
reasons given below, however, J is the more interesting statistic for 
our purposes. The definition holds only for n,; = 1. It could be 
generalized in a number of different ways but we do not consider here 
the situation with general values of n, . 

Before a discussion of the reasons for introducing J, we describe 
briefly the asmptotically efficient method proposed by Armitage [1959a]. 
Suppose that, at a series of known dilutioris x; , the numbers of positive 
and negative hosts observed are 7; and n; — r, respectively. The 
formula for the exponential response curve (1) may be written 


=e (3) 


where z; is the value of \,p at dilution 2, (i.e. y is the value of A,p in 
the undiluted inoculum, for which z; = 1), and P; is the expected 
proportion of negative hosts. The parameter y may be estimated by 
maximum likelihood (see, for example, Finney [1952, §21.5] and Peto 
[1953]). Denoting the estimate by 7, and substituting in (8) we have 
maximum likelihood estimates P; of P; and Q,(= 1 — P,) of 
Q;(= 1 — P,). Then the appropriate statistic is 


2 


The statistic ¢ is a weighted sum of deviations of observed from expected 
frequencies, the weights increasing with increasing dose. For large 
and equal values of n,;(= n) the asymptotic variance of ¢ is 1.0749n, 
and on the null hypothesis its expectation is zero. The large-sample 
test based on ¢ is asymptotically equivalent to a test of the maximum 
likelihood estimate of a parameter representing the coefficient of varia- 
tion of p. An approximation to ¢ is obtained by fitting the exponential 
response-curve by a non-efficient method (e.g. that of Fisher and Yates, 
[1957]), giving estimated values yj, , P{ , Q/ instead of 7, , P; and Q; . 
Then the statistic proposed is . 


, 
Now, the calculation of ¢ or ¢’ is considerably more cumbersome 
than that of either of the simpler statistics R and J described above, 
or Moran’s previous statistic JT. Consideration of efficient statistics 
may, nevertheless, provide a useful guide to the choice of less efficient 
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alternatives. The method of derivation of ¢ and ¢’ indicates that they 
have optimal properties for large'sample sizes, but provides no assurance 
that these properties extend to finite samples, particularly to a sample 
size of one observation at each dilution. In the absence of any indication 
to the contrary, however, ¢ and ¢’ would have, at least, a claim to be 
considered for small sample work. There is, moreover, an intuitive 
argument in favour of ¢ and ¢’. ‘They are particularly sensitive to 
departures below the fitted curve at high doses, and it is known 
(Armitage and Spicer, [1956]) ‘iat smail variations of p lead to de- 
partures from the exponential curve which are most readily detected 
at these high concentrations where the proportion of positives approaches 
unity. 

This latter consideration suggests that a desirable test statistic 
should not be symmetric, as are 7' and R, and is an argument in favour 
of J. Suppose that observations are made at successive a-fold dilutions, 
and that the first negative result occurs at a dilution z) = a”. Then 
with no assumption about the functional form of the response curve, 
the Spearman-Karber estimate of the ED 50 (the dilution at which 
P = 0.5) isa (Finney [1952], §20.6). Thus J — } isthe number 
of dilutions between that at which the first negative occurs, and the 
Spearman-Kiarber estimate of the ED 50. It therefore has the property, 
stated above to be desirable, of being particularly sensitive to negative 
results at high doses. Another way of expressing this feature of J is 
that, according to the Fisher-Yates method of fitting the exponential 
curve (Fisher and Yates [1957], Table VIII 2), the estimated value of 
log (A,p) at the first negative dilution is J log a — K, where K is a 
quantity tabulated by Fisher and Yates which is fairly constant if a 
wide range of dilutions is used. Thus, J gives an indication of the 
expected number of “effective” particles at the first negative dilution. 
The larger the value of J, the more surprising is the occurrence of the 
most extreme negative result. 


3. Relationship between J and 


We have seen that J, ¢ and ¢’ would all be expected to be sensitive 
to the same sort of departure from the exponential response curve. 
That tests based on J and ¢’ must be closely equivalent is shown by 
Table 2. This gives the values of R, J and ¢’ for all possible results 
with R = 0 to 6, in a series of 2-fold dilutions. For the calculation of 
¢’ the exponential curve was fitted by the Fisher-Yates method. The 
table shows only the results in the ‘‘equivocal” range of dilutions, and 
each series of positive (+) and negative (0) responses should be preceded 
by a (supposedly infinite) set of positives and followed by a set of 
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TABLE 2 
VALUEs OF ¢’ FoR VARIOUS SERIES OF RESPONSES WITH 
GivEN VALUEs OF R anv J. 
0 1 3 3 4 5 
R 
0 
(—0.4) 
2 0+ 
(—0.3) 
00+ 
(—0.2) (0.3) 
4 000+ 0+0+ 0-+-++- 
(-0.2) | (0.4) (3.0) 
00++ 
(0.5) 
5 0000+ | 0+00+ | 0++0+ | 0++++4+ 
(—0.2) (0.4) (3.1) (17.8) 
00+0+ | 0+0++ 
(0.6) (3.2) 
000++ | 00+++ 
(0.6) (3.9) 
6 00000 + | 0+000+ | 0++00+ | 0+++0+ | 0++4+4++ : 
(—0.2) | (0.4) (3.1) (15.9) (71.6) 
00-+00-- | 0+0+0+ | 0++0++ 
(0.6) (3.3) (16.0) 
000+0+ | 0+00++ | 0+0+++ 
(0.6) (3.3) (16.6) 
0000+ + | 00++0+ | 00+++4++ 
(0.6) (3.9) (19.4) 
00+0+-+ 
(4.0) 
000+ ++ 


(4.1) 
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negatives. Thus, for R = 3, J = 1, the series of responses is 


It is clear from Table 2 that an ordering of the results in terms of ¢’ 
is equivalent to the ordering (which involves ties) in terms of J, and 
that this ordering differs from that in terms of R. 

We have used ¢’ here rather than ¢ because of the greater ease of 
computation. Values of ¢ for some of the response series are shown 
in Table 3. The approximation of ¢ by ¢’ is poor at the higher values 
of each. This would be expected since the correction provided by the 
second term in the expression for ¢’ is inappropriate when the null 
hypothesis is strongly contradicted (cf. Armitage [1959a], p. 4). Never- 
theless ¢, like ¢’, is clearly more closely related to J than to R. 


4. Distribution of J on the null hypothesis for a two-fold series 


With the exception of the case of J = 0 the frequency function of 
J was obtained by enumerating the possible sequences of positive and 
negative results and computing their probabilities. The case where 
no positive results follow the first negative one (J = 0) is equivalent 
to the case of R = 0. Stevens has shown that Prob [R = 0] = 0.5, 
and hence Prob [J = 0] = 0.5. 

The nature of the computations for other values of the frequency 
function of J can be seen by considering the case of J = 1. The basic 
pattern of results, with the dilutions (- © < m < o), is as follows: 


z: 
+ + #O + 0 
+ + 0 0 0 
+ + #0 0 0 


where each series is preceded by an infinite series of positive results 
and followed by a similar series of negative ones. At the z-th dilution, ° 
where z; = 27‘, the probability of a negative result is 

P, = exp (—y 
where y is defined as in §2. The probability of a positive result is 
Q; = 1 — P,;. The probability of any row of the preceding doubly 


infinite sequence, summed over all values of m, is not independent of 
y but is a periodic function of log y, with period log 2. When investi- 


| 
| 
“4 
he 
ti 
4 
| 
| 
| eee 
0 
A: 0 
0 cae 
| 


DILUTION SERIES WITH SINGLE OBSERVATIONS 589 
TABLE 3 
VALUES OF THE STATISTICS ¢@ AND ¢’, FOR VARIOUS SERIES OF Resuts 
Series J R ¢’ ¢ 

1 2 -0.3 —0.3 
-00 + 3 —0.2 —0.2 
2 3 0.3 0.2 
-00 4+ 4+- 2 4 0.5 0.4 
-O+ ++ >>> 3 4 3.0 1.4 

3 5 3.2 1.4 


| 


gating the effect of the periodic component upon the frequency function 
of R for a two-fold dilution series with a single host at each dilution, 
Stevens found the probabilities to vary slightly in the sixth place. 
Since R and J are highly corre’ ried, the effect of periodicity upon the 
frequency function of J has .2en assumed to be negligible. For all 
computations here y has been put “qual to 3. 

To calculate the probabilitics of the frequency function of J with 
reasonable accuracy, those sequences of positive and negative results 
whose probabilities of occurrence were greater than or equal to 0.000001 
were listed. Before these probabilities were added together to form 
the frequency function of J, two different rules for rounding them off 
to five places were uscd. The first was based upon the assumption 
that the probabilities in the range 0.000001 — 0.000009 were uniformly 
distributed over the range and hence the ordinary arithmetic rule 
would be appropriate. The second rule considered an adjustment for 
the rapidly increasing number of sequences having small probabilities. 
This second rule was to round up to 0.00001 if the logarithm of the 
probability of the sequence was greater than 5.5 (the first rule when 
expressed in logarithms has as its decision point 5.69897). The results 
from the use of these two rules appear in Table 4, columns (1) and (3). 
A study of the validity of these approximations was done by repeating 
the rounding-off procedures by using them to round from the fifth to 
the fouth places. These results also appear in Table 4, columns (2) and 
(4). Acomparison of columns (1) and (2) in Table 4 shows that rounding 
off in the fifth place by the arithmetic rule under-estimates the prob- 
abilities obtained by rounding off in the sixth place. This bias is still 
present when the logarithmic rule is used but its effect can be seen to 
be not more than 3 in the last place. Hence, the frequency function 
of J given in column (3) of Table 4 is assumed to be correct to four 
decimal places. 
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TABLE 4 


FREQUENCY FUNCTION oF J ON THE NULL HYPOTHESIS FOR A 
Two-Fo.p Series or Di.utions.* 


Arithmetic round off Logarithmic round off 


(1) 5 decimals | (2) 4 decimals | (3) 5 decimals | (4) 4 decimals 


Prob [J = k] Prob [J = k] Prob [J = k] Prob [J = k] 


0 0.50000 0.5000 0.50000 0.5000 
1 0.33834 0.3380 0.33841 0.3385 
2 0.12561 0.1247 0.12586 0.1262 
3 0.03013 0.0289 0.03048 0.0303 
4 0.00455 0.0038 0.00475 0.0047 
5 0.00037 0.0001 0.00040 0.0003 
Total 0.99900 0.9955 0.99990 1.0000 


*Probabilities not calculated for k > 5. 


5. Power calculations 


Under the assumption of host variability, the probability of a host 
remaining uninfected at the z-th dilution is 


where A, is the number of particles in the undiluted inoculum, and F(p) 
is the cumulative distribution of p. Several distributions such as the 
beta, truncated exponential and gamma functions have been suggested 
for F(p). Here a two-point binomial with half of the host having a 
susceptibility level at p) and the remaining half at 1 — po was chosen. 
This simple although somewhat artificial function for F(p) does permit 
the variance of p, 02 = E(p — 3)’, to be varied systematically over its 
possible range from 0 to }. The values of po selected were po = 0.5, 
0.3, 0.1, 0.05, 0.01, and 0.002. As in §4 the periodic component has 
been ignored, and in all computations \,) was taken to be 1. 

From Table 4 the critical region at the 5 percent level for the statistic 
J is J > 3. For the statistic R, the two values of R on either side of 
the 5 percent level are R = 6 and R = 7. The corresponding prob- 
abilities for these three critical regions are Prob [J > 3] = 0.03563, 
Prob [R > 6] = 0.05160 and Prob [R > 7] = 0.02637. The calculations 
of the ordinates of the power curve of J proceeded as described in the 
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FIGURE 1 
PowER CURVES FOR TESTS BASED ON J 


preceding section. The complement of the power was calculated and 
subtracted from unity. 

The resulting power curves are depicted in Figure 1. For con- 
venience the power is shown as a function of log [(1 — po)/po]. For 
small values of log [(1 — o)/po], ie. small departures from the null 
hypothesis, the power curve for J climbs more rapidly than the two 
curves for R. This result might be expected from the considerations 
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discussed earlier. At high values of log [(1 — po)/po], however, the 
J-test appears to have lost its advantage. 
6. Discussion 

The calculations made here are not very extensive, but they suggest 
that the asymptotic advantages of J, for small departures from the 
null hypothesis, do not extend to the situation where the departure 
is sufficiently great for the power to approach unity. For a single series, 
of course, it is precisely these large, detectable, departures which should 
interest us. On the other hand, it has already been pointed out that 
a single series is only likely to be used as one out of a number of blocks, 
and the greater the degree of replication the smaller the departures 
which one can expect to detect. It is perhaps relevant that Armitage 
[1959b], examining experimental tumour data like those shown in 
Table 1, found that a departure from the null hypothesis could be 
detected by the J-test but not by the R-test. The method used was 
to calculate a value of J for each block (e.g. each row of Table-1), and 
to compare the observed distribution of J with that expected on the 
null hypothesis (see Table 4). Departures from the null hypothesis 
would be indicated by an excessive frequency of high values of J. 
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INDIVIDUAL DEGREES OF FREEDOM FOR TESTING 
HOMOGENEITY OF REGRESSION COEFFICIENTS 
IN A ONE-WAY ANALYSIS OF COVARIANCE’ 


D. S. Rosson anp G, F. ATKINSON 


Cornell University, Ithaca, New York, U.S. A. 


INTRODUCTION 


Homogeneity of the within-treatment regression coefficients is a 
necessary condition for the validity of the adjustments made to the 
treatment means in a one-way analysis of covariance These adjust- 
ments require that the within-treatment regression functions differ 
only with respect to their intercepts, or zero degree terms, and com- 
parisons among adjusted treatment means are then in effect comparisons 
among the intercepts When the treatments and the factor represented 
by the covariate interact in producing their effect, heterogeneity of 
regression coefficients results and the treatment effects adjusted in the 
usual manner no longer estimate the true effects. 

The assumption of linear regressions within treatments, which is 
implicit in any test of homogeneity of lines regressions, limits the 
type of interaction which can occur between treatment and covariate. 
In this case the null hypothesis being tested is characterized by families 
of parallel straight lines, and the alternative interaction-hypotheses, 
against which a homogeneity test is designed to have power, are re- 
presented by families of intersecting but still linear regression lines. 
The power of the conventional F-test of homogeneity is then an in- 
creasing function of the weighted variance among the slopes of these 
intersecting straight lines, >» w,(8; — B)’, where the weighting coefti- 
cients w, are determined by the levels of the covariate used in the 
experiment. 

Under some circumstances it seems reasonable to restrict the class 
of alternative hypotheses even further by imposing some regularity 
condition on the families of intersecting straight lines, and then to 
modify the structure of the homogeneity test so as to increase the power 
against this restricted class of alternatives. For example, in situations 
where the dependent variable is logically zero at the zero-level of the 
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\b. B =Co+ + 


FIGURE 1 
An ILLUSTRATION OF FAMILrEs OF LinEs IN WHICH THE SLOPE 8 18 
A POLYNOMIAL FUNCTION OF THE INTERCEPT a. 
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covariate, then a reasonable alternative to parallel regression lines is 
a family of lines intersecting at a common point near the origin. Families 
of regression lines satisfying this regularity condition—namely, that 
each family consist of a pencil of lines—are characterized by the fact 
that within such a family the slope 8, is a linear function of the intercept 
a, (see Figure la); a homogeneity test whose power is an increasing 
function of the (squared) linear correlation or regression of 8; on a; 
would therefore be particularly sensitive against this special class of 
alternative hypotheses. Insofar as practical situations do arise where 
the slopes of the linear regression lines tend to be greater for the better 
treatments, this test would then be more powerful than the conventional 
test of homogeneity. 

More generally, if the regression lines do not intersect at a common 
point, then the relationship between slope and intercept will be non- 
linear, and Figure 1b illustrates a configuration in which slope is a 
quadratic function of the intercept. Whatever the configuration of 
it regression lines, the slopes may be expressed as a polynomial function 
of the intercepts, of degree at most t — 1. If the within-treatment 
slopes are homogeneous, then the degree of this polynomial will be 
zero; otherwise, the degree will be greater than zero and will depend 
upon the pattern of heterogeneity. We propose to exploit this fact 
by constructing the orthogonal polynomial regression of estimated 
slopes 6, on the adjusted intercepts &; , and then testing homogeneity 
of slope by testing the significance of the individual coefficients of this 
orthogonal polynomial. Such a test procedure amounts to partitioning 
the ordinary F-test ‘among regression coefficients” into (at most) 
t — 1 individual tests designed to have greater power against the 
specific patterns of heterogeneity illustrated in Figure 1. This is 
essentially the spirit of Tukey’s single degree of freedom tests for 
non-additivity [1], and the mathematical justification follows that out- 
lined by Tukey. 


THEORY FOR TESTING THE POLYNOMIAL REGRESSION OF 
SLOPE ON ADJUSTED TREATMENT MEAN. 


The usual statistical model for a one-way covariance analysis in- 
volving ¢ treatments requires that at any given level X of the con- 
comitant factor the response Y under the 7’th treatment be a normally 
distributed chance variable with mean a; + @X and variance o”. The 
one-way experiment consists of taking n,; independent observations 
Y.,, Yen, under the z’th treatment at the levels X;, , , Xin, ; 
respectively, of the covariate. Vor fixed X,, , --- , X.,, the two sta- 
tistics 
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9: 


1 ni 
= Y.(X.; — 2), 
where 


are normally and independently distributed with means a; + fz; 
and respectively, and variances o’/n, and o*/w, , respectively. The 
statistic 


is then distributed as o”x?2,_, independently of g; and b,. Furthermore, 
in the one-way classification, the ¢ different sets of such statistics are 
mutually independent. 
The statistics b, , --- , b, are combined in the form of a weighted 
average 
t t 
b, = w.b,/ W; 


as an estimate of 6 and the statistics s} , --- , 8: are combined in the form 


as an estimate of o’. The intercept a; for the 7’th treatment is then 
estimated by the “adjusted intercept” 


&; = + — = bez; 


The unbiased property of the adjusted intercepts requires that the 
regression coefficients b, , ... , b, do, in fact, estimate a common param- 
eter 8. This hypothesis of homogeneous regression coefficients is con- 
ventionally tested using as the test statistic 


which is distributed as Snedecor’s F with — land n — = (n; — 2) 
degrees of freedom when the hypothesis of homogeneity is true. Since 
heterogeneity of slopes may be expected to be accompanied by some 
sort of regular relationship between slope and intercept, we propose 


s 
a i 
3 
: 
| 


HOMOGENEITY OF REGRESSION COEFFICIENTS 597 


to partition the sum squares w;(b; — into — 1 individual 
terms, each with 1 degree of freedom, for testing the t — 1 coefficients 
in the orthogonal polynomial regression of slope b; on adjusted inter- 
cept &; . 
For this purpose we wish to express >, w,(b, — b,)? as 
= = BY + + 
nd have 


where /,(@;) is a polynomial of degree vy in &; . This is accomplished 
by constructing the functions f, , --- , f:-; so that 


if 
1 if 


wh = 


t=1 


and the functions are uniquely determined if we add the additional 
restrictions 


w,f,(&;) =0 


for v = 1, --- , 4 — 1. Under these conditions the solution for the 
coefficients B, , --- , B,_, is given by 


B, = wba). 


A relatively simple method for constructing the functions f, , --- , f,-, 
has been given by Robson [2] for the special case of equal weights, 


w; = 1; the method is easily modified, however, to give the relation 
wa; 

W; 


where c, is a constant determined by the condition 
= 1. 
In particular, 


fi(@) = @ — &)/a = @ a/| w;(@; — 
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so that 


t t 1/2 
t=1 t=1 
The function f, is determined from the relation 


2 
1 fe = (> wag 
and so on. It is unlikely that any effects beyond the quadratic would 
be computed in practice, and ordinarily we would expect the experi- 
menter to compute only the linear term Bi and deviations from linear, 
Dd w.(b; — b,)? — B?. 

It now remains to show that the statistic 


= 


is actually distributed as Snedecor’s F under the hypothesis of homo- 
geneous regression coefficients. We note that as a function of the 
original observations Y,, , the orthogonal polynomial regression coeffi- 
cient 


B, = — 


is a homogeneous function of degree 1, but is not linear. For fixed 
,°** , and b, , however, the coefficients f,(@;) become constants 
and the conditional distribution of B, is then normal with mean 


and variance 


+ cov (b; , b; | bu) 


The conditional moments of }, , - - , 5, given b, are computed 
from normal regression theory as 


cov (0; , bw) 
E(b, | be) = E(b,) + [be — E(be)] 
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(bd, 2 2 
var (b, |b.) = var (41 == (1 


var (b;) var (b,) 
cov (b, , b; |b.) = cov (b, , b,) 


_ cov (by, cov (b; 


var (b,,) 


The conditional mean and variance of B, are therefore 


E(B, , = bo = 0 
_ 


var (B, lt = o wif,(&,) = o. 


and 


i=] 


TABLE 1 
InitT1AL Weiaut X AND AVERAGE GaIn Y 4 Lors or 10 Pics Eacn? 
Lot 6 rd 8 9 
Variable = Y x x 
J 79 1.96 61 1.40 62 1.22 a1 1.15 
2 65 ie ef 59 1.79 73 1.39 60 1.28 
3 57 1.62 59 1.61 58 1.28 54 1.40 
4 51 1.76 53 1.47 43 1.28 50 ase 
5 57 1.88 56 1.69 50 1.45 60 1.19 
6 66 1.50 50 1.48 44 1.22 61 1.18 
az 44 1.60 45 1.40 48 1:31 44 1.20 
8 41 1.49 39 1.42 51 1.57 53 0.96 
9 44 Oe if 38 1.20 40 1.21 41 1.13 
10 36 1.27 45 1.26 38 1.06 38 1.12 
Totals 540 16.62 505 14.8] 507 12.99 532 11.98 
me 30,770 26,143 26,771 29,248 
yY2 28.0068 22.1917 17.0569 14.4992 


XY = 913.24 756.65 


664.20 638.50 


experiment Totals: 


=2.084, ¥ =56.40. = 112,932, XY = 2,972.59, = 81.7546. 


“Reproduced from Snedecor, Statistical Methods, 5th edition, by permission of the publisher, 
lowa State College Press. 
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Thus, for fixed 9, and the statistics B, are 
each N(0, o”). Furthermore, the conditional covariance between B, 
and B,,. is zero, 


cov (B, ,B,, , bu) 


so the joint conditional distribution of B, , --- , B,_, is that of t — 1 
independent N(0, o”) chance variables. Since this conditional distri- 
bution is functionally independent of g, , --- , 9, and b, then it is also 
the unconditional distribution of the B’s, and the statistics B, , --- , B,-,, 
and b,, are all mutually independent. Finally, we use the 
fact that s* is independent of each B, to complete the proof that B?/s* 
is distributed as Snedecor’s F. 


NUMERICAL EXAMPLE 


The computational procedure for this test of homogeneity of re- 
gression coefficients is illustrated with the example used by Snedecor 
[3] in his discussion of covariance in a completely randomized experi- 
ment. The data, duplicated here in Table 1, represent the initial 
weights X and average daily gains Y in 4 lots of 10 pigs each. The 
covariance analysis for comparing-the lot mean daily gains adjusted 


TABLE 2 
ANALYSIS OF COVARIANCE 


Deviations from Regression 
Lot Yu f Mean Square 
6 9 1,610.0 15.760 .3844 8 .2301 
7 9 640.5 8.745 .2581 8  .1387 
8 9 1,066.1 5.607 .1829 8 .1534 
9 9 945.6 1.164 .1472 8. .1458 
Within 32 6680 .0208 
Reg. Coef. 3 .0751 .0250 


Common 36 4,262.2 31.276 .9726 35 7431 


= 31.276/4,262.2 = .7338(10-*). 
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to a common initial weight is shown in Table 2. The conventional 
test of homogeneity of regression coefficients, 
among regression coefficients mean square 0250 


=- = 1.21, 
within lot deviations from regression mean square  .0208 


with 3 and 32 degrees of freedom, gives a non-significant F-value. 

By our proposed test procedure the among regression coefficients 
sum of squares is partitioned into 3 single degree of freedom sums of 
squares for testing the linear, quadratic and cubic regressions of the 
within lot regression coefficients on the adjusted intercepts. These 
components are: linear, B] = .0516, as shown in Table 3; quadratic, 
Bz = .0177, as shown in Table 4; and cubic, B3} = .0751 — .0516 — 
.0177 = .0058. None of these components is significant when tested 
against the error mean square of .0208 


TABLE 3 
CALCULATION OF LINEAR SuM OF SQUARES 
Lot | Wi wid; (js — buds)? 
6 | 54.0 1.662 1610.0 15.760 1.6020 
7 50.5 1.481 640.5 8.745 1.2330 
50.7 1.299 1066. 1 5.607 . 8593 
yg 53.2 1.198 945.6 1.164 .6522 


> w.é; = 228,642.44, £, = 52.4711, 
Dd = 6,142.0982, = 1.4411, 
Dd = 47.8224, = 45.0718, 
b,, >> = 12.0260, = 12.0413, 
> wg, — = 4901.7765, — bey)? = 4753.6317, 


_ (2.7659)° 


= 148.1449 ~ 0516, 


Slope = -2:7859. 


~ 148.1449 0187. 
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THE DUAL RESULT FOR TESTING HOMOGENEITY 
OF INTERCEPTS 


In some applications. the assumption of constant slopes and vari- 
able intercepts is replaced by the dual assumption of constant intercepts 
and variable slopes. The latter assumption is the basis, for example, 
of the technique used in fishery biology for ‘“‘back-calculating” the 
length of a fish at earlier ages from the lengths of the scale annuli [4]. 
In the preceding numerical example, if the z’th treatment mean took 
the form a + 8;X we would expect the linear regression of b; on &; 
to have a slope approximately equal to 1/Z = .0192, which is very 
close to the value .0187 obtained by the computations shown in Table 3. 
This dual model may be the closer to reality more frequently than is 
realized. 

Under the assumption of homogeneous intercepts, the least squares 
estimator of the common parameter a is the weighted average of the 
individual intercepts a; , with weights g, being proportional to the 
inverse of the variance of a, : 


—2 


qi = nw,/(w, + = (Xi; - 27>. X?;, a= 


j=1 i=1 qi 


The adjusted slopes, or the least squares estimators of 8; under the 
hypothesis of constant intercepts, take the form 


Homogeneity of the intercepts may be tested by 
= q.(a; ewer a,)’/s° (t = 1) 


or by testing the orthogonal polynomial regression of intercepts a; ¢ 
the adjusted slopes 8; , 


a; — a, A,h,(B,) + 
where 


for v 


1 for »=>»’, 


A, = gash 
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fT. ABLE 4 
CALCULATION OF Sum oF SQUARES 


Plot 2; aj ai ai | wy C2f [cofe(ai)]? | 


6 | 54.0 1.662 1.2657 | 1.6020 | 2.0277 | 1610.0! .1142(10~*) | .1304(10-%) | 15.760 
7 |50.5} 1.481} 1.1104 | 1.2330! 1.3691 | 640.5 | —.3138(10-) | .9847(10-3)| 8.745 
| 50.7) 1.299) .9270, ..8593| .7966 | 1066.1 | —.1987(10-") | .3948(10-3)| 5.609 
.8076| .6522| .5267| 945.6) .2382(10-') | .5674(10-3)| 1.164 
| 

= 4500.9295, 4, = 1.0560, 

> = 4901.7766, >> = 5488.8083, 

wa — wa, 148.7950 


az 
. = 2.2180 — 1.1501 = 1.0679, 


A W; 

= 

1.7981 


= .01766. 


Since a, is statistically independent of }>%4, X,,;¥.; then by the same 
argument used earlier the polynomial regression coefficients A, ,---, A,-, 
are conditionally distributed as t 1 independent N (0, c’) chance 
variables for fixed values of *** X,;¥,; and a, , and 
hence are also unconditionally distributed in this manner. that is, 
+--+ , are independently distributed as variables 


DISCUSSION 


The test procedure described here is analogous to Tukey’s test for 
non-additivity in a two-way classification in the sense that our Bi 
and Tukey’s single degree of freedom sum of squares for non-additivity 
become identical under comparable conditions. In order to reduce a 
one-way covariance design to a two-way factorial design we must add 
the restriction that the same levels X, , --- , X, of the covariate occur 
under each of the ¢ treatments, so the covariance model becomes 
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Yi; = ay + BX; + 
or 


AX, 


Conversely, this same model may be obtained from the general two-way 
additive model 


by assuming that the levels of the second factor are given by X,, --- , X, 
and that the effect of this factor is a linear function of X; that is, 
p; = B(X; — Z). Under these comparable conditions, the interaction 
(or non-additive) effect pr,, enters both models in the form of hetero- 
geneous slopes, so the full model becomes 


Yi, = 7; + BX; — Z) + (8B; — BX; — 2 + 4; 
= etre t + 
so Tukey’s sum of squares for non-additivity and our B7] become 


As demonstrated by J. Cassady [5], Tukey’s test for non-additivity in 
the two-way case may be extended to include individual degrees of 
freedom which correspond to our B} , --- , Bi_, and which become 
identical to ours under these comparable conditions. 

This test procedure for the one-way covariance design is constructed 
specifically to have power against some restricted classes of heterogeneous 
but still linear alternatives to the hypothesis of homogeneous (parallel) 
linear regressions. Referees have pointed out to the authors, however, 
that under certain conditions this test also has power against non- 
linear alternatives to the hypothesis of parallel straight lines. For 
example, in the special case where the true underlying regression is 
quadratic and is the same for every treatment, then configurations 
of X-values for the different treatments can be found which will produce 
a significant correlation between slopes and intercepts of the best fitting 
straight line regressions within the treatments. Because of such pos- 
sibilities, some caution must be exercised in interpreting a significant 
F-value in this (or any other) test of linear homogeneity; that is, the 
significance may be due to actual linear heterogeneity or to an error 
in the original assumption. that the within-treatment regressions are 
linear. A case in point is the numerical example used here to illustrate 
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the computational techniques; a referee has shown that for these data 
the sum of squares attributable to inclusion of a quadratic term in 
the within-treatment regression model falls at the same level of signi- 
ficance (approximately 6 percent) as our Bt . If the hypothesis of 
parallel linear regressions is rejected, this then raises the question of 
whether the true regressions are linear but heterogeneous or whether 
they are even linear. ‘ 
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THE STATISTICAL ESTIMATION OF A 
RECTANGULAR HYPERBOLA 


N. Hey 
Radcliffe Infirmary, Oxford, England 
AND 
M. H. Hey 
Dept. of Mineralogy, British Museum (Natural History), London S.W. 7., England 


SUMMARY 


\ simple method is described for fitting a function of the type 
(X — a)(Y — b) = c toa Set of observations by minimizing a quantity 
that closely approximates to the sum of the squared standardized 
normal residuals. It is also shown that certain difficulties hitherto 
associated with the maximum likelihood solution to a relation between 
variables all of which are liable to experimental error are due to confusion 
between the root-mean-square standardized residual and the root- 
mean-square standardized normal residual. 


INTRODUCTION 


If we have a number J of sets of experimental data in two variables 
x and y, both subject to errors of measurement, and, if we can reasonably 
assume that the errors ef measurement are independent of one another 
and are normally or quasi-normally distributed with zero mean, and that 
the true relation between z and y is some known function f(z, y; a, b, ---), 
where some at least of the coefficients a, b, --- are unknown, then the 
best-fitting curve through the experimental points will be that which 
minimizes the quantity 


where (2; , y,) is an experimental point, (X, , Y;) an undetermined point 
on the curve, and e¢,, , ¢,, the standard deviations of the errors of measure- 
ment of x; and y; respectively. 

A direct and rigorous solution of this minimal problem is only 
possible in a very few special cases, the most important of which is 
when the curve to be fitted is a straight line and the ratio «,,/¢,, is 
the same for all points. Many types of equation can be brought into a 
linear form by a simple change of variable, such as z = log z, and thougl: 
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the new error-ratio ¢,,/e,, Will not.in general be constant, its variation 
over the range of the measurements is often small enough to be dis- 
regarded, giving a solution that is practically satisfactory though not 
rigorously optimal (we also note that if ¢,, is normally distributed, 
e,, Will not in general be so; "but again, the difference can commonly 
be disregarded if the errors are not too large). 

In the present note we consider a small group of curves that 
cannot be reduced to linearity, namely the rectangular hyperbola 
(c — a)(y — b) = c, where a, b, and ¢ are unknown constants, and 
those curves, such as y = pe”"" or (x — a)\(y — b) = cx or 
(c — a)(y — b) = cry, that can be reduced to the simple rectangular 
hyperbola by a change of variable (note that if either a or b is known 
or zero, the rectangular hyperbola can be reduced to linear form; we 
assume that both are unknown and the equation thus irreducible). 

Equations of this type are rarely encountered in current biological 
work, but this is probably because their irreducibility to linear form 
makes them intractable rather than because they would not be ap- 
propriate. R. V. Brown [1952] fitted a rectangular hyperbola to dose- 
response curves, and Lloyd, Jukes, and Cunningham [1958] suggested 
the simple form (2 — a)(y — b) = c and an exponential form reducible 
to it as alternative curves for the ventilation-oxygen relation in res- 
piratory studies. Indeed, some curves of this type may well be ap- 
propriate in many physiological relations, whenever there is no response 
to a stimulus below a certain threshold value and the- response ap- 
proaches a saturation value as the stimulus increases. 

Brown [1952] fitted a hyperbola to his data by minimizing the sum 
of the squares of the differences between the area of a rectangle defined 
by the asymptotes to the estimated curve (X — a = Qand Y — b = 0) 
and an experimental point P and the rectangle of constant area c 
defined by the asymptotes and any point on the estimated curve (Figure 
1). If the true curve is (§ — a)(n — B) = y and (X — a)(Y — b) =c 
is an estimate of it, and if the coordinates of the experimental point P 
are (x, y), this area is equal to (c — a)(y — b) — c, and accordingly 
Brown minimizes 
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This solution is not wholly satisfactory: the quantity minimized 
is graphically the square of an area, whereas the best (maximum likeli- 
hood) solution should be obtained by minimizing the sum of the squared 
standardized residuals, 


If we take e,, and ¢,, as the units of measurement of z and y respectively, 
the standardized normal residual at P (z,; , y,) becomes the normal 
to the curve from P(PQ, Figure 1). If ¢,, and ¢,, are constant, it will 
be evident from the geometry of the hyperbola that when z — a is 
large, (x — a)(y — b) — c approximates to PQ(x — a), where PQ is 
the normal residual; and when y — b is large, (x — a)(y — b) — ¢ 
approximates to PQ(y — b). Accordingly, compared to a procedure 
that uses squared normal residuals, points with high values of x or y 
receive excessive weight. 


THEORY 


A way round this difficulty is available. Consider any experimental 
point P (Figure 1). Let the variables be normalized by taking their 
assessed experimental errors ¢,, , €¢,, aS units of coordinates; let OAB. 
ODE be the asymptotes to the estimated curve CQF. Draw CPD 
parallel to OAB and APF parallel to ODE; join CF and draw PT at 
right angles to CF, and PQ normal to the curve. Brown’s estimator is 
> [(c — a)(y — b) — ¢]’, or >> R’ if we write (x — a)(y — 6) —c = R. 
Now in the normalized units the area APCB = DPFE = R/ee, , 
and as DP = (y — b)/e, and AP = (x — a)/e, , we have R = PF(y — b) 
e, = PC(x — a) e,. But PC-PF = PT-CF = twice the area of tri- 
angle CPF, and CF? = PC’ + PF’. Hence 


PT? = PC?-PF*/(PC* + PF”) 


= R’/[(x — + (y — 

If the experimental errors are small, the experimental points will 
be near the curve, the curve CQ/’ will approximate to the straight 
line CTF, and the normal residual PQ will approximate to PT. Thus 
we have to a close approximation, writing 


(x — a)’ + (y — = F’, 
= a)’ + (y — = (R’/F’). 
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Now it is usually possible to obtain a fair approximation to a and b 
from a simple graph of x against y, and since y — 6 will be large when 
« — ais small and vice versa, moderate errors in these preliminary 
estimates of a and b will not affect the sums F’ appreciably. We can, 
therefore, proceed to fit a rectangular hyperbola to a set of data, mini- 
mizing a quantity that closely approximates to the sum of squared 
normal standardized residuals, by estimating a and b graphically, 
computing F? for each point using these preliminary values, and mini- 
mizing the sum >> (R?/F’) or a quantity proportional to this sum. In 
practice, this is done by weighting the points inversely as their values 
of F?; the weighting factor w is taken as 1 or 2 for that observation for 
which F’ is greatest, and the other w-values are calculated only to the 
nearest integer. The procedure is equally applicable when e, or «, 
or both vary from one experimental point to another; it is also un- 
necessary to know the absolute values of ¢, and «¢, , so long as their 
ratio and their variation from point to point are known. In practice, 
it is found that a knowledge of the number of significant figures that 
can properly be written for each measurement is an adequate measure 
of e. 

Substituting R/F for (x — a)(y — 6) — ¢ (= R) in Brown’s solution, 
we arrive at the matrix equation: 


| YLwry b |= | 
b> wy we w — ab. wary 


This equation will often serve where an approximate solution is 
required, but it takes no account of the fact that F’ is a function of 
a and b; it can readily be shown that the variation of F’ calls for ad- 
ditional terms in the right-hand column, which becomes: 


{> wey? — | — MER? wa’y] 


The additional terms are small, but may affect the result appreciably, 
as will be seen from the numerical example below. It may be noted 
that since PT is always greater than the true normal PQ when P lies 
below the curve (Iigure 1) and always less when P lies above, the new 
solution is not strictly accurate, but, since the exact expression for 
PQ involves the fourth powers of x and y, a rigorous solution is likely 
to be unmanageable. 

Whether the approximate or the more exact form of the equation 
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is used, it is solved by multiplication by the reciprocal of the square 
matrix, giving (for the exact form): 


a H P Q] > waxy’ wle 
b K Ri > — [> wly — \/D 
|Q R L wry 


where . 
H= Dw Dwr’ - (Ewa), 
K = wy’ — wy)’, 
P= Yur Vw - wey, 
D=H Vw’ +P t+Q wy, 
=P K +R wr, 
=Q wy +R YLwr+L wv, 
and this yields the solution: 
a = [H{ >) wry? — [Do — ade 
b = [P{ way? — — 
e — ab = [Q{ wry’ — w(x — R’/F’)} 


If the values found for a and 6 differ substantially from the preliminary © 
values a’ and b’, the solution can be refined by reiteration. In the 
applications so far made, this has proved unnecessary; the equations 
are not very sensitive to variations in a’ and b’. In many examples 
the difference between Brown’s curve and ours was surprisingly small, 
but the present procedure is statistically better based, and the ad- 
ditional labour in computation is sligh:. 
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FIGURE 1 
Derivation OF AN Approximation, P7', ro THE Norma Resipuat PQ. 


EXAMPLE 


As a numerical example we shall use a set of data on the quantitative 
relation between pulmonary ventilation and the alveolar gas pres- 
sures. We plot the values graphically (I’igure 2), and from the graph 
we derive the provisional estimates a’ & 30 and b’ = 1.5. We estimate 
that pO, , (x), can be relied on to about +1.0 mm. Hg., except for 
high values; for these we estimate the error at +2 mm. Hg. The 
error in the determination of the slope (y) is roughly proportional to 
the slope itself, and is estimated at 10 percent. 

We tabulate x and y and their estimated errors ¢, and ¢, , calculate 
(c — a')e,, (y — b’)e, , and F”, and then by dividing each value of F” 
into the largest, we obtain the weighting factors w, which we round 
off to the nearest integer: 

To form a first approximation to the small terms in the right-hand 
column of the matrix equation, we require an approximate value of c, 
as well as of a and b, in order to form R = (x — a’')(y — b’) — ¢’; it 
is convenient to compute (z — a’)(y — b’) for each point and take 
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Curves Fitrep To THE SAME ExPERIMENTAL Pornts: a, BY BRown’s METHOD, 


(X — 41.7)(Y — 1.53) = 22.4; b, By THE NEw TECHNIQUE, 


(X — 24.7)(Y — 1.45) = 42.2. 
612 
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TABLE 1 


x | €z ty (x @ (y — b’)ez w 
54 2.88 | 1 0.288 6.912 1.38 49 .680 183 
69 2.39 | 0.239 9.321 0.89 87 .673 104 
90 2.16 ! 0.216 12.960 0.66 168.397 54 
122 1.90 0.190 | 17.480 0.40 305.710 30 
208 1.59 2 0.159 28 .302 0.18 801.036 11 
637 | 2 95 .299 0.14 9081 .919 1 


= —a')(y — 0’)/N = 33.79. We then calculate (x — a’ )@R?/F? 
and (y for each point. 

Then, mu'tiplying each value of x, x’, zy, — a’')@R?/F’, etc., 
by the corresponding value of w and summing, and substituting these 
figures in the matrix equation (as the matrix is symmetrical, the terms 
below the principal diagonal may be omitted), we have: 


| 


9502.4500 67700.11 968.3017 | 166197.1405 — 58.4210 
2794365 28503 |b | =| 5907159.49 12.84 
G68 30 28503 383 lle — ab| 67700.41 | 


Multiplying by the reciprocal of the matrix: 


a 1G70197.87 —776118843.27 | 
=) 1670197.87 2083341600 —5773025.3470 
c —ab| —T76118843.27  —5773025.3470 


| 166138.7196 | 
5907147.25 |/D 
| 6770041 
where D = 6740830567.4857. 


Performing the matrix multiplication, we get: 


a = 23.2250, b = 1.44117, ¢ — ab = 10.7937, whence c = 44.2649. 


fn view of the large standard error of the coefficients (see below), 
we do not think these values differ sufficiently from the first estimates, 
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TABLE 2 
Ca.cuLaTions For THE Matrix Equation 
(x — — b’) R R?/F? (z — | (4 — 
33.12 —(.67 0.0090 0.01792 0.01242 
34.71 0.92 0.0097 . 0.02161 0.00863 
39.60 5.81 0.2005 0.56127 0.13233 
36.80 3.01 0.0296 0.09831 0.01184 
16.02 -| —17.77 0.3942 1.77390 0.14191 
42.49 8.70 0.0083 0.12418 0.00232 
202.74 0.6513 


a’ = 30 and b’ = 1.5, for a further approximation to be warranted; 
we have, however, carried out a series of further approximations to 
show the convergence of the procedure, giving: 


TABLE 3 
EstiMaTES FOR SUCCESSIVE APPROXIMATIONS 
Approximations 
Rough 
estimate 1 2 3 4 5 6 

a 30 23.2250 | 24.9374 | 24.6337 | 24.7286 | 24.7199 | 24.7207 

b 1.44117 | 1.44997 | 1.44867] 1.44913 | 1.44908 | 1.44908 

c 33.79 44.2659 | 41.9601 | 42.3508 | 42.2179 | 42.2296 | 42.2284 
R2/F?| 0.6513 | 0.53187 | 0.52623 | 0.52603 | 0.52603 | 0.52603 | 0.52603 


At each stage we have also calculated (using the new values, a, 
b, and ¢c, not the original estimates a’, b’, and c’) the sum of squared 
standardized normal residuals }~ R’/F? which is the quantity being 
minimized. ‘The next approximation is, of course, made using the 
values of a, b, cand >> R?/F? just obtained. 

In order to check the effect of the small additional terms 
> ( — a)éR’/F’ and © (y — b)2R?/F’, we have also made calcu- 
ations using the simple equation without these terms; this gives as 
first approximation a = 25.4627, b = 1.45568, c = 41.1231, )> R’/F’ = 
0.52790, which is quite a useful rough estimate; but further approxima- 
tions, so far from improving this result, actually increase >. R?/F”, 
and converge to about a ='25.991, b = 1.4572, ¢ = 40.494, )> R?/F? = 
0.5309. 
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TESTS OF THI GOODNESS OF FIT OF THE CALCULATED LINE 


It is usually desirable to have some measure of the goodness of 
fit. of the calculated line to the observations, either as an indication of 
the confidence to be placed in the line, or as a measure of the scatter 
of the observations. For this the quantity }> R?/F?(N — 3) is ap- 
propriate, while its square root, the best estimate of the root-mean- 
square standardized normal residual, serves as a useful test of whether 
the estimated errors of measurement of x and y are enough to account 
entirely for the departure of the points from the line; for if 
JV >> R'/F'(N — 3) exceeds unity’, it is evident that the actual stand- 
ard deviation of the departure of the measured points from the line 
exceeds our estimated errors. In the present case, i R’/F(N — 3) 
is approximately 0.41, and we conclude that our estimates of the errors 
of measurement are enough to account entirely for the departure of 
the points from the calculated curve. 

It is also often desired to have some figure indicative of the degree 
of confidence that can be placed on the calculated constants a, b, and c, 
and for this it is usual to calculate the variances or the standard devia- 
tions of estimate of the constants. These are readily calculated from 
the quantity 5° k°/F?(N — 3); if we call this quantity S’, we have 
for the variances, to a fair approximation’: 

= HSF2/D, = 


@& = SF°(Hv + Ka? + L + 2Pab + 2Qb + 2Ra)/D, 


where /2 is that value of F* that corresponded to w | in the course 
of the calculations. 


‘Since the expected value of the sum of squared standardized residuals, + 
(Ye — va2/eye}, is 2N if « , ey are true estimates of the error variances, it might appear that the 
test value of vz R2/FXN — 3) should be +/2 rather than unity. However, >> k?/F? is a sum of 
squared normal residuals; that is, it is not the sum of the squared distances between the true and ob- 
served points in standardized measure, but the sum of the squared normals from the observed points 
to the curve, which can be shown to have an expected value only half that of the former quantity 
(see Appendix below). 


2More exactly, the variances should be derived from the terms of the reciprocal of 
> wry wr’ wri, 


where Ey = we?R!/F2 and E, = wez?R?/F2; but since Ey and will always be small compared 
to we wy? and a wz?, the approximations H//D, etc., are usually adequate. In our example, Ey = 
0.841 and Ez = 31.7, and with these correction terms 6. becomes 7.98, 6), 0.0711, and 6,, 11.77, in- 
creases of 2 percent, 1 percent, and 1} percent respectively 
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In our example, S’? = 0.17729 for the first approximation, from 
which we calculate ¢, = 7.85, 6, = 0.0705, ¢, = 11.60; the standaru 
deviations for the later approximations differ from these figures by 


less than 0.6 percent. The experimental data are thus satisfactorily 
represented by the equation: 


(X — 24.72 + 7.9)(Y — 1.449 + 0.071) = 42.23 + 11.6. 


This is plotted in Figure 2, where it is compared with the equation 
computed by Brown’s method, namely (X — 41.7)(Y — 1.53) = 22.4; 
the improved fit is easily seen. 

The present procedure is well adapted for working either with a 
desk calculator or an automatic electronic machine. By kind permis- 
sion of Dr. Fox, one of the authors (E. N. H.) has programmed the 
procedure for use on the Ferranti Mercury machine of the Oxford 
Computing Laboratory. 


APPENDIX 


M. G. Kendall [1956] describes an unexplained peculiarity of the 
maximum likelihood solution to a linear relation of two variables bot! 
subject to experimental error, an inconsistent estimate of the erro: 
variance; and we have asserted above that the expected value of th: 
squared standardized residual computed for our solution will be unity 
being a normal residual, whereas the expected value of the square: 
standardized distance between the true and observed points, the residuz: 
proper, will be 2. 

The basic source of these difficulties is a fact not brought out i 
any derivation of a maximum likelihood estimate of a relation of tw 
or more variables all subject to experimental error that has come t 
our notice: we cannot confine ourselves to finding those values of the 
coefficients that will maximize the likelihood function; because the 
function involves the unknown locations of the true points on the line 
or surface of relation, we are compelled at the saine time to find those 
estimated locations of the true points that will maximize the likelihood 
function. We do not, as a rule, desire to know these special points 
(which are in fact the feet of the normals from the observed points te 
the line or surface), and because we do not actually evaluate them it 
is easy to overlook the fact that our estimated residuals apply to ther 
and not to the actual true points. That is, we are estimnating the root- 
mean-square normal residual, not the root-mean-square residual prope). 

Thus, for example, Kendall’s equations 4.62 and 4.63 (loc. cit.} 
involve, not the true unknown U, but an estimate of U having th 
property of maximizing log L, obtained from equation 4.50. Hence 
Kendall’s supposed estimate of of is really an estimate of the mean 
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squared normal residual in terms of the mean squared residual proper; 
and as will now be shown, his expected value of 3 is perfectly in order 

Let (X/e, , Y/e,) be the coordinates of a true point in standardized 
measure, X and Y obeying the relation f(X, Y) = 0. Then, if the 
experimental errors in X and Y are statistically independent, the locus 
of a set of equiprobable observed points (x/«, , y/€,) Will be the circle 


with centre at the true point and radius 7, where r’ is a function of the 
probability of such an observed, point, and may be set equal to the 
mean standardized residual. All points on this circle are equally 
probable, and we seek to find, in terms-of the radius r, the mean value 
of the square on the normal from a point on the circle to the locus 
{(X, Y) = 0, which passes through the centre of the circle. 

So long as the radius of curvature of the curve is large compared 
to the radius r, the curve and the normals from the circle to the curve 
may be replaced, to a good approximation, by the tangent to the curve 
wut the true point (the centre of the circle) and the normals from the 
cirele to this tangent. Now for any point. on the circle whose radius 
vector makes an angle 6 with this tangent, the length of the normal 
to the tangent will be r sin 6; taking @ as the variable of integration 
since all values of 6 are equiprobable, the mean of the square on the 
normal to the tangent will be 


r’ sin’ = 
T Jo 


That. is, the expected value of the squared standardized normal residual 
is half that of the squared standardized residual to a close approximation, 
or exactly half if the functional relation is linear. 

The above applies to the bivariate case; if there are n variables, 
the expected value of the squared standardized residual will be n, but 
that of the squared standardized normal residual remains unity, exactly 
if the funetional relation is linear, and approximately in the general 
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A STOCHASTIC STUDY OF THE LIFE TABLE 
AND ITS APPLICATIONS: 
I. PROBABILITY DISTRIBUTIONS OF THE 
BIOMETRIC FUNCTIONS’ 


Lona CHIANG 


Division of Biostatistics, School of Public Health, 
University of California, Berkeley, California, U. S. A. 


1. INTRODUCTION 


The life table is one of the oldest, most useful, and best-known 
topics in the field of statistics. It has many applications in various 
areas of research where birth, death, and illness may take place. The 
earliest life tables date as far back as the seventeenth century; Halley’s 
famous table for the City of Breslau, published in the year 1693 [9], 
already contained most of the columns in use today. The subject 
matter, however, is by no means limited to human beings. Zoologists, 
biologists, physicists, manufacturers, and investigators in other fields 
have found the life table a valuable means of presenting their data. 
In spite of its popularity in many research areas, the life table as a 
subject has yet to be systematically explored from a statistical point 
of view. 

There are two forms of the life table in general use: the cohort 
(or generation) life table and the current life table. In its strictest 
form a cohort life table records the actual mortality experience of a 
given group of individuals over a period of time extending from birth 
until the death of the last member of the group. A current life table, 
on the other hand, considers the mortality experience of an entire 
population at one point in time. The purpose of this investigation 
is to present a stochastic view of the subject, taking random variation 
into consideration and treating all the biometric functions as random 
variables. The results of our study will be given in a series of papers. 
In the first paper probability distributions of the main biometric func- 
tions are presented and formulas are derived for the corresponding 
mathematical expectations, variances, and covariances. Some of the 
findings are by no means original, but they are included for the sake 
of completeness. 


1Presented at the joint meeting of the American Statistical Association and the Biometric Society, 
ENAR, in Atlantic City, September 13, 1957, under the title, “‘An application of stochastic processes 
to the life table and standard error of age-adjusted rates" (3). 
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Although each of the biometric functions has the same meaning 
and the same probability distribution in the cohort life table as in the 
current life table, it is important for the study of their random variation 
to keep in mind the order in which these functions are computed. 
In the cohort life table the number of survivors and the number of 
deaths are measured directly in an actual population; thus they are 
the basic random variables from which the proportion of deaths and 
other columns are obtained. In the current life table, on the other 
hand, the column of the proportion of deaths is first computed from 
the population death rate; other biometric functions are random vari- 
ables only because they are functions of this proportion. We shall, 
therefore, in the second paper of this series present formulas for the 
sample variances and covariances of the biometric functions in terms 
of the number of survivors for the cohort life table and in terms of 
the actual age specific mid-year population and age specific death rate 
for the current life table. 

The third paper will be devoted to the application of these formulas 
to practical problems in follow-up studies of patients affected with 
specific diseases in which there are some survivors on the closing date 
of the study; because of incompleteness of information, expectation 
of life and some other quantities in the life table cannot then be com- 
puted by the conventional method. Here we suggest a convenient 
means of computing the observed expectation of life and the corre- 
sponding standard error. The problem of competing risks is also 
treated. An actual follow-up study will be used by way of illustration. 

The general form of the life table is reproduced below for the purpose 
of reference; the symbols used deviate slightly from the conventional 
ones in order to simplify formulas in the text. For a detailed descrip- 
tion of life-table structure, the reader is referred to the work of Dublin, 
Lotka, and Spiegelman [5], Greville [8], and Reed and Merrell [11]. 

In the table, and throughout this paper, the term ‘‘age’’ refers to 
the exact age. The symbol z; is the age at the beginning of the interval 1; 
Z, Will be used to denote the age at the beginning of the final interval 
in any given life table. 

The age x, may be taken as 0, the time of birth, and J, the size of 
the original cohort. From J, on, all the biometric functions in the 
above table are treated as random variables that are estimators of 
the corresponding unknown quantities. The symbol gq, will be used 
to denote the unknown true probability of a person of age z, dying 
between zx, and z,,, , and e, the true expectation of life at age x, , for 
4=0,1,---,w. 

The term ‘‘observed expectation of life’ is introduced for the symbo! 
é, to distinguish it from its unknow” true value e, . Because of their 


< in 
is 
ia 
fh 
| 
| 
Lee 
te 


620 BIOMETRICS, DECEMBER 1960 


LIFE TABLE 


Number Total no. 


Proportion Number of _ of years of years Observed 
Number _ of deaths deaths lived remaining expecta- 
Age of within age within age within age to tion of 
interval survivors interval interval interval survivors life at 
To to Zi ly do do Lo To éy 
tO l; d; L; é; 


limited use, we shall not discuss the distribution of the quantities in 
the columns L, and 7’; . If desired, their distributions and formulas 
for expectations and variances can be obtained, respectively, from those 
of 1, and é; . 

In the text the foilowing symbols will also be introduced: 


qi; = Pr [an indiviauai alive at age x; will die in interval (2, , z,)], 
pi; = Pr [an individual alive at age z; will survive to age z,]. 


When z; = 2,4; , we will drop the second subscript and write g, for 
Qs.s+, and p, for p;,¢4, . Obviously, g and p are complementary. The 
corresponding estimators are denoted by 


4:3 => 1 Di; = 4; = 1 and Di = 


Finally, we will write x; to denote the length of the interval 7 (i.e., 
Iss, — x; = n,). When n is equal to one for each age interval, we 
have the “complete’’ life table. 

Throughout this investigation, we shall assume a homogeneous 
population in which each individual is subject to the same force of 
mortality and in which the probability of death for one individual 


.is not influenced by the death .of any other individual in the group. 


2. PROBABILITY DISTRIBUTION OF lz. , THE NUMBER 
OF SURVIVORS AT AGE z 


In the usual life table the various biometric functions are given 
only for integral ages or at other discrete intervals In the derivation 
of the distribution of survivors, however, it is more convenient to treat 
age as a continuous variable and to derive formulas for J, , the number 
of individuals surviving the age interval (0, x), for any positive value x. 
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The distribution of 1, may be obtained by different approaches. 
Perhaps the simplest is to consider the 1, survivors as the number of 
successes in /, independent and identieal trials with a probability po, 
of surviving the interval (0, xz). It follows then that /, is a binomial 
random variable. However, this approach by itself does not give the 
formula for the probability po, . The explicit formula for po, can be 
derived by the “pure death process” (see, for example, [6] and [1]), 
which we shall sketch below. 

Let uw, be the force of mortality acting upon each individual in the 
original cohort /, , such that 
uz Ax + o(Azx) = Pr [an individual alive at age x will die between 


ages randz-+ Az], for zx 2 0, (1) 


where Az stands for an infinitesimal time interval and o(Az) a quantity 
of a smaller order of magnitude than Ax. We are interested in the 
probability function of J, , given that there are J, individuals alive 
at age 0: 


P,, 0,7) = Pr{i,=k|l, atage (2) 
The standard procedure for obtaining this probability function is to 


derive an explicit form of the probability generating function defined as 


Gi,(t, = E(t 


l,) = tP,, .(0, 2). (3) 


The derivatives of this one function provide a convenient way of 
computing all of the probabilities in (2), and the moments of l, as well. 
Using the established procedure [6], we found 


G,(t, xz) = E — exp ar + ar}: (4) 


Substituting 


Pos = em as} (5) 


for the exponential function in (4) gives the generating function of 
the probability stated in (2): 


G,,(t, = [1 — pos + for xz 20. (6) 


Formula (6) will be recognized as the generating function of a binomial 
random variable in J, independent and identical trials with the binomial 
probability po, as given by formula (5). For xz = 2, , we have the 
probability that an individual will survive the age interval (0, z,), 


Roxas. 
4 
| sas 
| 
OS 
[3 


622 BIOMETRICS, DECEMBER 1960 


= for 1=0,1,---,w, (5A) 
and the generating function for the survivors 1, , 


G,,(t, = [1 — po + for =0,1,---,w. (6A) 


We are now in a position to use the binomial theorem ‘to obtain the 
required probability function for /, , 


! 
Pr {l; =k | l, at age 0) = Tae 


for 
the mathematical expectation, 
Kl, = for t = 0,1, ---,w, (8) 
and the variance 
= , for t = 0,1, (9) 


with po. + doc = 1. 
In general, the probability of surviving an age interval (z, , 2;) 
is given by 


Par = exp for ‘33; 1,j=0,1,---,w, (10) 
zi 
with the obvious relationship, 


Pai = for a Sis a,t,j = 0,1, (11) 


The generating function for the conditional distribution of 1, given 
l; is 


for 14,)=0,1,---,w. (12) 
When 7 = ¢ 4- 1, (12) becomes 
for 0,1,---,w-—1. (13) 
Although formula (12) holds whatever may be z; < 2, , it is im- 


. portant to point out that the conditional probabilities of 1; relative 


to lo, 1, , +++: , l, are the same as those relative to 1, in the sense that 
for each k 


Pr{l, -+-, = Pr = 
for i <j; 1,7 = 0,1, ---,w. 
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In other words, the sequence /, , /, , --- , /. is a Markov process ((6], 
p. 338). Thus we have 


= EU, for 2 1,3 =0,1,---,w, (14) 
and also 
E(tj' | ln, «++, i) = E(tj' | Ul), for i<j; i,j =0,1,---,w. (15) 
3. JOINT DISTRIBUTION OF , THE 
NUMBERS OF SURVIVORS 


Following the idea of the preceding section, we introduce the gene- 
rating function of the joint probability distribution of 1, , --- , le : 


Gi, be) = --- & (16) 
which uniquely determines the joint probability 

Pr{l, = =k. |b atage O}. 
Using a procedure described previously ({2], pp. 84-85) we obtain 


Lemma 1. The survivors l, , --+ , l, in the life table form a random 
vector with components having the binomial distribution; the generating 
function of the joint distribution and the covariance between any two of 
the random variables are given, respectively, by 


Gi, be) = [1 — {poll — + — &) 
+ postite(l — ts) + + Powlite — (17) 
and 
= — pos), for 1,7=0,1,---,w. (18) 
Proof of formula (17) follows from the identity 
and from formula (15). Combining (15) and (19), we can write 
E[t' G37) = Blt | 
for 7=0,1,---,w-—1, (20) 


where the conditional expectation of the quantity inside the braces 
is the generating function of the conditional distribution of J,,, given 
l; with the explicit function as presented in (13). Formula (17) is 
obviously true for w = 1, since in this case (17) becomes 


G,,(t) = [1 (21) 
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which is identical to (6) for « = z, . Now suppose (17) is true for 
w — 1, and we may write 
= — {pull — + pool (l — + 


we want to prove that (17) is true also for w. Using identity (20) for 
7 = w — 1, generating function (16) may be written as 


Writing (13) for i = w — 1 and substituting in (23) give 
(24) 


where 


= — + = — — (25) 
Because of formula (22), (24) becomes 
[1 — — 4) + — 4) + --- 
+ Po,w-atils te-a(1 — to-2) + — (26) 


Now substituting (25) in the last term inside the braces, 
Po.e-rtile — 85-1) 
= — — — te)}] 


where pow is written for Po,.-1Pw.-1 [equation (11)]. Formula (26) 
thus becomes identical with the generating function (17), and the proof 
is completed. 

Formula (18) ean be proven by direct computation from the relation 


where the symbol @ is written for the generating function (17) and the 
partial derivatives are taken at the point (4, , --- , 4) = (1, --- , 1). 
When 7 = j, formula (18) reduces to the formula for the varianee 
of 1; [equation (9)]. 

The joint probability of the random variables J, , --- , l,. can now 
be obtained from (17) by differentiating with respect to the arguments. 
It turns out to be 
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Pr {4 =k, = ky | bo} 


ki-1! 


t=1 


= 0,1,--- ,k&-,, witthk, = 


. JOINT PROBABILITY DISTRIBUTION OF , dw, 
THE NUMBERS OF DEATHS 


In a life table covering the entire life span of each individual in a 
given population, the sum of the deaths at all ages is equal to the size 
of the original cohort. Symbolically, 

d,+d,+--- +d, = h. (28) 


Each individual in the original cohort has a probability of dying in 
the interval (x; , ..;,,), which is easily shown to be 


, for 1=0,---, uw; (29) 


for, if an individual at age 0 is to die between ages x; and z,,, , he must 
first survive the age interval (0, 2,). The multiplication theorem implies 
(29). Since he is to die once and only once somewhere in the life span 
covered by the life table, the sum of the probabilities in (29) is unity; or 


Po0Qo + + Powdw = 1, 
where Poo = 1 and q, = 1. Thus we have the well-known 
Lemma 2. The numbers of deaths, dy , --+ , da, , in a life table have a 


multinomial distribution with the joint probability distribution 


ecpectation, variance, and covariance are given, respectively, by 

Ed; | lo) = for 1 =0,---, (31) 

= — for i =0,---,v; (32) 


and 


= —loPoidiPoigi » for J; = 6, (33) 


Remark 1: In the above discussion, age 0 was chosen only for 
simplicity of presentation. For any given age, say x, , the numbers 
of deaths occurring in subsequent intervals also have a multinomial 
distribution with the total number of deaths equal to the number of 
survivors at age x, . The probability that an individual alive at age 
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x, Will die in the interval (z, , x,,,) subsequent to x, is given by 


for t =a, (34) 
It can be readily shown that the sum of the probabilities in (34) is 
unity but we shall not give the details here. 
5. VARIANCE AND COVARIANCE OF 4; , THE PROPORTION 
OF DEATHS IN THE AGE INTERVAL (2; , 2441) 


The proportion of deaths occurring in an age ‘nterval is the ratio 
of two random variables 


= for 0,1, (35) 


Our interest in this section is to derive formulas for the expectation, 
variance, and covariance of these proportions. 

It is convenient at this point to reintroduce the proportion of 
survivors in the age interval (x, , 2441), 


for 1=0,1,---,w— 1. (36) 
Since 
P+4 = 1, (37) 


the mathematica! expectation of the proportion of deaths is comple- 
mentary to the expectation of the proportion of survivors. These 
proportions have the same formulas for the variance and covariance: 
o%, = 03, , and og, ¢, = o%,,3,, fori,j = 0,1,---,w—1. 

The generating function (13) shows that the conditional distribution 
of 1,,, given 1; is binomial and has the conditional expectation 


41 | l,) = Lip; for a = 0, (38) 


From (38) we derive the expectation of #, , 
A i+ 7 1 7 
wp) = = | 


= Le. | m= » for a == 0, (39) 
and hence the expectation of 4, , 


It is interesting to note from formula (10) that the ratio of the 
expectation of survivors at the end of an interval to the expectation 
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of survivors at the beginning of the interval is also equal to the prob- 
ability of surviving the interval. Consequently, 


i, F(l,) ’ for i= 0,1, 


the expectation of the ratio of the two random variables, /;,, to 1; , 


is equal to the ratio of the expectations, a relationship not necessarily 
true in general. 


The variance of #, (or 4;) may be written in the form 


oi, = | 
1 wp 


Ellis, | = — + 
is again obtained from the generating function (13). By substitution 
and coilection of terms we have the formula for the variance, 


,w-l, (41) 


where 


Pi); for i= 0, 1, (42) 
When J, is large, formula (42) may be approximated by 


1 
= pl for +=0,1, w. (43) 


The expectation of the reciprocal of 1, can be written as 


where the second term inside the square braces is the relative-variance 
of l, and the third term is a quantity of a smaller order of magnitude 
than the relative-variance. Using the formulas (8) and (9) for the 
expectation and variance of |; , we have 


_ 
[E(L)]}° lopos 
which may be taken as zero for large values of J, . Consequently, the 
quantity inside the square braces in (44) may be taken as unity and 
formula (42) is approximated by (43). 


3It is obvious from formulas (35) and (36) that gi and #; are defined only for positive values of 
ly. If ls were equal to zero, di and 1541 would certainly equal zero, and the biometric functions de- 
scribed in the life table, as well as the life table itself, will have ceased to be meaningful. Thus we shall 
use the convention that the denominator of (42) cannot take on the value of zero before the interval 
w4,, which is to say, before the termination of the life table. 
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To derive the formula for the covariance between the proportions 
of survivors (or deaths) in two age intervals, we write 
= —pp,, for 39; = 0,1, ---,w, (45) 
with the conditional expectation, 


| Bo) = pf 


Recalling from formula (14) that the conditional expectation of 1;,, 
relative to J, , 1,., , atid J; is the same as the conditional expectation 
of 1,,, relative to l, , we have 


l, | 4) | l;) | l, ] 


= i Lp; | | =p,. (46) 


Substitution of (46) in (45) gives the covariance 


= — pp, = — pip; = 9, 
for j; 1,7=0,1,---,w. (47) 


Remark 2: What is proved above is the zero covariance between 
p, and 7, , but not their independence. In fact, it can le shown [4] 
that p, and 7; are not independently distributed; and in particular, 
Cireenwood’s assumption [7] = is proven to be false. 

The findings in this section may be summarized in 


Lemma 3. The proportion of deaths, 4, , (or of survivors, p,) in an age 
mterval is an unbiased estimator of the probalnlity of dying in (or of 
urviving) the interval with a variance as given by (42); the covariance 
between two proportions q, and q, (or between p, and p,) vanishes whatever 
may bet ¥ j, fori,j = 0, +--+, w. 


It should be pointed out that formula (47) of zero covariance is 
obtained only between proportions for two non-overlapping age iti tervals. 
If we are considering two intervals both beginning with the same age 
and extending to the ages 2, and .c, , respectively, the covariance 
between the proportions p,, and /,, is not equal to zero. Using the 
sume approach as in the derivation of (42), it is easy to show tht the 
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formula for the covariance is given by 

= — Pai), for a<iSj; a,1,7=0, ---, w. (48) 
When I, is large, we have the approximate formula 

bas = Pall — Pad, for a <iS 0. 


lor ¢ = j, (48) and (49) become formulas for the variance of p,, . 
If x, = b, l is constant; both formulas (48) and (49) are reduced to 


9 


which is the covariance between the z,- and 2,-year survival rates, 
and can be obtained directly from the covariance between 7; and 1; 
as given by (18). 


6. DISTRIBUTION OF @ , THE OBSERVED EXPECTATION 
OF LIFE AT AGE z, 


The observed expectation of life at any age x, summarizes the 
mortality experience of the population under consideration beginning 
with age xz; , for? = 0,1, --- , w. Certainly to the demographer or 
public health worker, this column is the most useful in the life table. 

To avoid confusion in notation, let us denote by @ a fixed number 
and by x, a particular age; we are interested in the distribution of 
é, , the observed expectation of life at the age z, . Consider /, , the 
survivors to the age z, , and let Y, denote the future lifetime of a 
particular individual beyond the age xr, . Clearly Y, is a continuous 
random variable that can assume any positive real value. Let y, be 
the value that the random variable Y, takes on; thus x, + yz is the 
entire length of life of the individual from the time of birth until death. 
Let f(y.) be the probability density function of the random variable 
Y, and dy, an infinitesimal time interval. Since Y, can assume values 
between y, and y, + dy, if and only if the individual of age z, survives 
the age interval (x, , Za + Ya) and then dies in the interval (t. + Ya, 
La + Ya + dy,), the probability density function of Y, is given by 


where Pa.a+y, , the probability of surviving the interval (1, , Za + Ya), 
is defined in (10) and u,,.,, is the force of mortality at age z. + y. 
given in (1). 
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The function f(y.) in (50) is an honest probability density function . 
in the sense that it is never negative and that the integral of the func- 
tion from y, = 0 toy, = © is equal to unity. Clearly, it can never 
be negative, whatever may be the value of y, . To evaluate the integral, 
we recall formula (10) and write 


f(y.) dy. = [ op {-[ dy. 


ta 


Now define a quantity @ such that 


and substitute the differential 


in the integral to give the solution 


dy, = [et dg = 1. 


The mathematical expectation of the random variable Y, is the 
expected length of future life beyond the age z, , and thus may be 
called the true expectation of life at age z, . In accordance with the 
definition given the symbol e, , we may write 


Ca= YallYa) Yu = Ya exp [ dy, . (51) 
90 9 Za 


Thus the explicit function of e, and the variance of Y, , 


= [ (Ya — exp{— dy. , (52) 


both depend on the force of mortality’. 
We will consider the future lifetimes of J, survivors as a sample 
of 1, independent and identical random variables, Y.,; , --* , Yate » 


-each of which has the probability density function (50), the mathe- 


matical expectation (51), and variance (52). According to the central 
limit theorem, as /, approaches infinity, the distribution of the sample 
mean 


(Yas + + 


®While it ig not the purpose of this paper to consider particular functions of the force of mortality, 
& separate study of the observed expectation of life under various assumption of the force of mortality 
is in preparation. 
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is approximately normal, with a mean of e, as given in (51). Clearly 
Y,, is equal to é, , the observed expectation of life at age z. 

As in the case of any continuous random variable, the value of 
Y, is not accurately measured. In point of fact, the values of /, random 
variables are not individually recorded in the life table, but rather 
they are grouped in the form of a frequency table in which the ages 
x, and z,;,, are the lower and upper limits for the interval 7 and the 
deaths d; are the corresponding frequencies, fori = a,a +1, ---,w. 
The sum of the frequencies equals the number of survivors at age x, , or 


The total number of years remaining to the |, survivors depends on the 
exact age at which death occurs, or on the distribution of deaths within 
each age interval. 

Suppose that the distribution of deaths in each interval is such 
that, on the average, each of the d; persons lives a,n, years in the age 
interval (x, , 2:4,), where a, is a fractional number, then on the average 
each of the d, persons will have lived z; + a,n,; years, or x; — Za + an; 
years after age x, , and the observed expectation of life at age z, is 
obviously 


for a=0,1,--- ,w. (53) 
Using the relationship d; = J, — 14, , and arranging terms, we have 
a general formula for the observed expectation of life, 


=an.+ for «=0,1,-:-,u, (54) 
t=at+l 
where c; = (1 — a;-:)n,;-, + ayn, , fori > a. Now, if n, = n, for 
t= a,a+1,---,w, and if the distribution of deaths in each interval 
is assumed to be uniform so that a; = 3, then c, = n and (54) reduces to 


i, (55) 


a formula often used to compute the observed expectation of life at 
age Te. 

Clearly, under the respective assumptions regarding the distribution 
of deaths in each age interval, the observed expectation of life given 
in formula (54) or (55) is an unbiased estimator of the corresponding 
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true expectation of life as expressed in formula (51). On the other 
hand, because the mathematical expectation of the ratio of survivors, 
l, tol, , is equal to the ratio of their expectations, Ippo; to lopoa , as 
shown in formula (41), the mathematical expectation of the observed 
expectation of life as given by (54) is simply 


e,=an,+ cp.:, for a=0,1,--:,w. (56) 
t=a+l 
Formula (56) will be used in developing the formula for the variance 
of é@, . Asa further aid in deriving the variance of é, , it is convenient 
to note the relationship between e;,, and e; , the expectation of life 
at the begiinnng of two consecutive intervals, 


— an, = + (1 — an jp, , for t= 1,---,w— 1]. (57) 


The variance of the observed expectation of life is obtained from (54), 
expressing é, as a linear function of the proportions of survivors. Thus 
its varianee is 


w w i 
t=at+l t=atl jeit+l 


Substituting formula (48) in (58), we have 


w w 
> cp, — pad +2 cep. — | 


att 


atl 


Using the relation p., po p., and formula (56), 


= fee, — 2an,) + Ca ana | (60) 


Since ¢, (1 ~ ay an, , the quantity inside the braces may 
be rewritten as 


ele, — 2an,) + 2c, 


= [e, + (1 — a, — fe; — 


=fe, + —a, — — - 
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Substitution of the last expression in (60) gives 


= (les + (1 "pa. 


— feu. + — any} — a an.) (61) 


Making the substitutions of pa.w.s: = O and (e, — ana) = 
leass + (1 — in (61), and combining terms, 


Since Pa.ist = PaiDs , We have the final formula for the variance of 
the observed expectation of life at age xr. , 


for 1. (62 


When /(1/l,.) is approximated by 1/2 (1,), pa, is written for 

and (43) is used for the variance of g,; , formula (62) ts reduced to 
wt 

, for a = 0,1, 1. (63) 

Phas we have proved a rather useful theorem in the study of the 
life table. 


Theorem, If the distribution of deaths in the age interval (a, , 2441) 
is such that, on the average, cach of the d; individuals lives awn, years 
in the interval, fori = a, a + 1, +++ , w, then as 1, approaches infinity. 
the probability distribution of the observed cxpectation of at age 
as given by (54) is asymptotically normal and has the mean and the variance 
as given by (56) and (63), respeciively. 


It should be noted that (63), which is an approximation to the exac: 
formula (62) for the variance of é, when /, is a random variable. is 
identical with (62) when /, is a given constant, such as io . 

As a matter of practical interest, the following corollary deserves 
particular mention. 


Corollary: If the age interval is constant, that ts, af 4, == m,. and 


if the distribution of deaths in cach interval is such that, oii the averag: 


f 
‘ | 
if 
2 
od 
ia 
| 
Hig 
| 
Phi 


6354 BIOMETRICS, DECEMBER. 1960 


each of the d, individuals lives half the interval (x; , 2441), for i = a, 


a+ 1, --- , w, then the variance of the observed expectation of life at 
age 7, as given by (55) ts 


= > Pats — (., + | for a=0,---,w. (64) 


Proof. Whenn, = nanda, = 3,c, = nandc, — 2ayn, = 0. From 
(60) we have 


pac — ( 
an Pali 2 ’ 


which can be rewritten as (64). 

Remark 3: Although the theorem concerning the asymptotic dis- 
tribution of the observed expectation of life is true for the cohort and 
the current life table, it is not clear why formula (63) holds also for 
the latter case. In the current life table, the basic random variable 
4; is computed from actual mortality experience and, in general, its 
variance is not given by either formula (42) or formula (43). There-- 
fore it is essential to prove (63) from the viewpoint of the current 
life table.* 

The observed expectation of life, as given in (54), is a linear function 
of #4; , Which, in the current life table, is computed from 


Pai = PoPasi for (65) 
Clearly, the derivatives taken at the true point (pe , Pasi, °°", Pj-1) are 
= 0, for £2 j, (66) 


and hence 


= + (lL — a,)n,]. (67) 


‘Using a different approach [12], professor E. B. Wilson derived the following formula for the 
variance of 


1 w-l 
72 > + ’ 


which is in error in that the quantity ains should be replaced by (1 — ai)ni. 
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Because of (66), the derivative (67) vanishes when i = w. Since it 
has been shown in Lemma 3 that the covariance between proportions 
of survivors of two non-overlapping age intervals is zero, the variance 
of the observed expectation of life may be computed from the following: 


= ¥ a, [ot . (68) 


Substitution of (67) in (68) gives formula (63). 
When the distribution of deaths within each age interval is assumed 
to be uniform, a; = 3, and (63) becomes 


2 
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SAMPLE SIZE FOR A SPECIFIED WIDTH CONFIDENCE 
INTERVAL ON THE VARIANCE OF A 
NORMAL DISTRIBUTION 


FRANKLIN A. GRAYBILL! AND RoBert D. Morrison 


Oklahoma State University, Stillwater, Oklahoma, U.S. A. 


If an experimenter decides to use a confidence interval to locate a 
/parameter, he is concerned with at least two things: (1) Does the interval 
contain the parameter? (2) How wide is the interval? In general, the 
answer fo these questions cannot be given with absolute certainty, 
but must be given with a probability statement. If we let l-a be the 
probability that the interval contains the parameter, and let 6’ be the 
probability that the width is less than d units, then the general procedure 
is to fix l-a in advance and compute 6°. The value of @’ is in general ~ 
a function of the positive integer n, the sample size by which the con- 
fidence interval is computed. 6’ is also a function of a. In most con- 
fidence intervals, 6” increases as n increases. For any particular situation 
8” may be too low to be useful, hence an experimenter may wish to 
increase 8” by taking more observations (increasing n). The problem 
the experimenter then faces is the determination of the sample size n 
such that (A) the probability will be equal to l-a that the confidence 
interval contains the parameter, and (B) the probability will be greater 
than or equal to 6’ that the width of the confidence interval will be 
less than d units (where a, 6’, and d are specified). 1-a will be called 
the confidence coefficient, and 8’ will be called the width coefficient. 

To solve this problem will generally require two things: (1) The form 
of the frequency function from which the sample of size n is to be 
selected; (2) Some previous information on the unknown parameters 
in the frequency function. 

This suggests that the sample be taken in two steps; the first sample 
will be used to determine the number of observations to be taken in 
the second sample so that (A) and (B) will be satisfied. 

For a confidence interval on the mean of a normal population with 
unknown variance this problem has been solved by Stein [2] for 6’ = 1. 

The purpose of this paper is to illustrate a method for determining n 
to satisfy (A) and (B) for the variance of a normal distribution. A set 
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of tables is presented which will be needed for the solution of this 
problem. The theory for this paper is presented in [1]. The notation 
is the same as in this paper except we replace a in [1] by 1-a. 

Suppose that a confidence interval with confidence coefficient 1l-a 
is desired on the variance co’ of a normal population. If a sample 
°** , Of size n is taken and 


1) 


computed, then the confidence interval is 


Xar2(N) ~ X1-¢a/2)(M) 


where x/(t) is the upper 100 y percentage point of a chi square variate 
with ¢-1 degrees of freedom. Before the sample v, , «++ , v, is taken, 
it is desired to determine n so that the probability is at least 6’ that 
the width is less than d units. Suppose that an unbiased estimate of 
o° based on m — 1 degrees of freedom is available from previous data. 
Denote this estimate by z/(m — 1), and assume that z/o” is a chi- 
square variate with m — 1 degrees of freedom. To determine n, com- 
pute g, by the formula 


qe = dx5(m)/z, 


and the degrees of freedom n — 1 on which to base s; can be found 
from the tables for specified values of 6’ and 1-a. Tables are included 
for confidence coefficients l-a = .50, .80, .90, .95, .98, .99; for width 
coefficients 6” = .5625, .8100, .9025, .9506, .9801, .9900, .9980; for 
d.f. = 2(1)30, 40(10)100. The entries g, are computed by the formula 


X1-(a/2)(N) Xa/2(n) 


For example suppose v is distributed normally with mean » and 
variance o”, and we want to determine the sample size n so that a 
confidence interval on o” will have confidence coefficient 95 percent, 
width less than 6.0 units, and width coefficient 81 percent. Suppose 
that a previous sample of 11 values (denoted by y, , -*+ , ¥11) from this 
distribution yielded 


From this information we get m = 11, z = 15,1 — a = Ya,p = 81, 
8 = .90,d = 6.0, x3(m) = 4.87; hence g, = 1.95; and from the tables 
we get d.f. = 23,2 = 24. This says that, if we draw a random sample 
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TABLE 1 
Sampce Size For SpEciFiep WiptH CoNnFIDENCE INTERVAL ON 
THE VARIANCE OF A NORMAL POPULATION 


1 = 50 = .80 
8 75 90 95 975 99 .995 75 95 -975 
-5625 .8100 .9025 .9506 .9801 .9900 -5625 .8100 .9025 .9506 .9801 .9900 
df. Qn Qn 
2 3.82 6.34 8.25 10.16 12.69 14.60 12.56 20.85 27.13 33.41 41.71 47.99 
3 2.39 3.63 4.54 543 6.59 7.46 6.37. 9.70 12.12 14.50 17.60 19.92 
4 1.80 2.60 3.17 3.73 4.44 4.97 4.37 6.31 7.70 9.04 10.78 12.06 
5 148 2.06 247 286 3.36 3.73 3.40 4.74 5.68 6.58 7.74 8.59 
6 1.27 1.72 2.04 2.34 2.72 3.00 2.82 3.33 4.53 5.20 6.05 6.67 
7 1.12 149 1.75 199 230 2.52 2.44 3.24 3.79 4.32 4.98 5.47 
x 102) 1.3. 1.54 1.74 200 2.18 2.16 2.83 3.28 3.71 4.25 4.65 
9 93 1.20 138 1.55 1.77 1,93 1.96 2.52 2.91 3.27 3.72 4.05 
10 86 1.10 1.26 1.41 1.60 1.73 1.79 2.29 2.62 293 3.32 3.60 
il Sl 102 1.16 1.29 1.46 = 1.58 1.66 2.10 2.39 2.66 3.00 3.25 
12 76 95 108 1.19. 1.34 1.45 1.55 194 2.20 2.44 2.75 2.96 
13 72 S9 1.01 1.11 1.25 1.34 1.46 1.81 2.05 2.26 2.53 2.73 
M4 6s $4 95 1.04 1.16 1,25 1.38 .1.70 192 2.11 2.36 2.53 
Fs) 65 80 $9 98 109 1,17 1.32 1.61 1.80 1.98 2.21 2.37 
16 63 76 “85 93 103 1.11 1.26 1.53 1.71 1.87 2.08 2.22 
17 60 73 $1 89 98 1.05 1.20 146 1.62 1.77 1.96 2.10 
18 58 70 77 85 93 1,00 1.16 1.39 1.55 1.69 1.86 1.99 
19 6 67 74 81 89 95 1.11 1.33 148 1.61 1.78 1.89 
20 54 65 71 78 85 91 1.08 1.28 142 1.54 1.70 1.81 
2 53 62 69 75 82 87 1.04 1.24 1.36 148 1.63 1.73 
22 | 60 .67 72 79 84 1.01 1.19 1.32 143 1.56 1.66 
23 50 59 64 70 .76 81 98 1.16 1.27 1.37 1.50 1.60 
24 48 57 62 67 .74 .78 95 112 1.23 133 1.45 1.54 
25 AT 55 60 65 71 75 93 1.09 1.19 1.29 1.40 1.48 
26 46 54 59 63 69 73 90 106 1.16 1.25 1.36 1.43 
27 45 5 57 61 67 71 88 103 1.12 1.21 1.31 1.39 
28 44 51 56 60 65 69 86 100 1.09 1.17 1.28 1.35 
29 43 50 54 58 63 67 84 98 106 1.14 1.24 1.31 
30 42 49 3 57 62 65 82 95 104 1.11 1.21 1,27 
40 36 40 43 46 50 f2 .69 .78 84 90 96 1.01 
0 31 35 37 40 42 44 60 .68 72 76 82 85 
60 28 31 33 35 37 39 54 64 74 
70 26 28 30 32 33 35 0 5 58 61 64 .67 
dU 24 26 28 20 20 32 AG 50 58 61 
22 24 26 aa 28 29 43 47 49 54 
100 21 23 24 25 26 27 40 44 46 48 50 52 


oi size 24 from the above distribution, the probability is at least .81 
that the 95 percent confidence interval on o” will be less than or equal 
to G units in length. 

The tables can also be used to solve a somewhat different problem. 
lor example, suppose we want to set a confidence interval on the 


Jaurhince ao of a normal distribution such that the width of the interval 
wp perceul of the variance with width coefficient 6. To find the sample 


4 
638 
| 
Ai ad 
| 
Ri 
ame 
‘ 


THE VARIANCE OF THE NORMAL DISTRIBUTION 


TABLE 1 CONTINUED 


Sampte ror Srecirrep ConFripENCcE INTERVAL ON 
THE VARIANCE OF A NORMAL PoPULATION 


639 


1 —a = .90 1 —a@ = 95 
75 -90 95 975 99 .995 75 95 975 
id -5625 .8100 .9025 .9506 .9801 .9900 5625 .8100 .9025 .9506 .9801 .9900 
@ 

2 26.57 44.12 57.41 70.69 88.25 101.53 | 54.39 90.33 117.53 144.72 180.67 207.86 
3 11.15 16.97 21.21 25.37 30.79 34.85] 18.60 28.30 35.38 42.32 51.36 58.12 
4 7.01 10.13 12.35 14.50 17.28 19.34] 10.63 15.36 18.73 22.00 26.22 29.34 
5 5.19 7.23 8.66 10.04 11.81 13.11 745 10.39 12.46 14.44 16.97 18.85 
6 4.17 5.66 6.70 7.69 8.94 9.87 5.79 7.87 9.30 10.68 12.42 13.71 
7 3.53 469 549 6.25 7.21 7.91 4.78 6.36 7.45 848 9.78 10.73 
8 3.08 4.03 4.67 5.29 6.06 6.62 4.11 5.37 6.23 7.04 8.07 8.82 
9 2.75 3.55 4.09 4.60 5.24 5.70 3.62 4.67 5.38 6.04 6.88 7.50 
10 2.50 3.18 3.65 4.08 4.62 5.02 3.25 4.14 4.74 5.31 6.01 6.53 
11 2.30 2.90 3.30 3.68 415 4.49 297 3.74 426 4.74 56.35 5.79 
12 2.13 2.67 3.02 3.36 3.77 4.07 2.73 342 387 430 483 §.21 
13 2.00 248 2.80 3.09 3846 3.73 2.55 3.15 3.56 304 4.41 4.75 
14 1.88 2.32 2.60 2.87 3.20 3.44 2.39 294 330 3864 4.06 4.37 
15 1.78 2.18 244 2.69 2.99 3.21 2.25 2.75 308 339 3.77 4.04 
16 1.70 2.06 2.30 2.53 2.80 3.00 2.13 2.59 290 3.18 3.52 3.77 
17 1.62 196 2.18 2.39 2.64 2.82 2.03 2.45, 2.73 2.99 3.31 3.54 
18 1.55 187 207 2.27 2.50 2.67 1.94 2.33 2.59 2.83 3.12 3.32 
19 149 1.79 198 2.16 2.38 2.53 1.86 2.23 2.47 2.69 2.96 3.16 
20 144 1.71 1.89 2.06 2.27 2.41 1.79 2.13 2.36 2.56 2.82 3.00 
21 1.39 165 182 197 2.17 2.30 1.72 2.05 2.26 2.45 2.69 ~- 2.86 
22 1.34 159 1.75 190 2.08 2.21 1.66 1.97 2.17 2.35 2.57 2.73 
23 1.30 1.54 1.69 1.83 2.00 2.12 1.61 190 2.09 2.26 2.47 2.62 
24 1.26 149 163 1.76 192 2.04 1.56 183 201 2.17 2.37 2.52 
25 1.23 1.44 1.58 1.70 1.86 1.97 1.51 1.77 194 2.10 2.29 2.42 
26 1.20 140 1.53 1.65 1.79 -1.90 147 1.72 188 203 2.21 2.34 
27 1.17 1.36 148 160 1.74 1.84 143 1.67 182 196 2.14 2.26 
28 1.14 1.32 144 1,55 1.68 1.78 140 162 1.77 190 2.07 2.18 
29 111 1.29 1.40 51 1.64 1.73 136 1.58 1.72 1.85 2.01 2.12 
30 1.09 1.26 1.37 1.47 1.59 1.68 1.33 1.54 1.68 1.80 1.95 2.05 
40 90 103 1.10 41.17 1.26 1.32 1.10 1.25 134 143 1.53 1.61 
50 .79 83 94 1.00 1.06 Lil 95 1.07 1.14 1.21 1.29 1.34 
60 .70 78 83 88 93 97 85 94 100 106 1.12 1.17 
70 .64 71 75 .79 83 86 77 85 80 25 1.00 1,04 
80 65 69 .76 78 72 78 83 87 9). 
90 61 oF 66 -70 72 67 .73 77 80 &A 87 
100 52 57 60 62 65 67 63 68 72 75 .78 81 


size n on which to base the confidence interval does not require the 


sample to be taken in two steps. 


The theory is as follows. 


A con- 


fidence interval on o” based on a sample of size n is given in equation (1). 


The width is w = 


c,(n — 1)s? where 


= 1 1 . 
X1-¢a2)(n) Xa/2(N) 
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TABLE 1 CONTINUED 
SAMPLE S1zE FoR SPECIFIED WiptH CoNnFIDENCE INTERVAL ON 
THE VARIANCE OF A NORMAL POPULATION 
1 —a = .98 1 -—a@ = .99 
8 75 90 95 975 99 75 90 9 975 995 


ae -5625 .8100 .9025 .9506 .9801 .9900 | .5625 .8100 .9025 .9506 .9801 .9900 


d.f. Qn dn 


137.64 228.61 297.43 366.25 457.23 526.04 | 276.44 459.16 597.39 735.61 918.33 1056.54 
35.42 53.89 67.37 80.59 97.80 110.67 | 56.96 86.68 108.35 129.62 157.30 178.00 
17.72 25.60 31.22 36.67 43.69 48.90 | 25.65 37.06 45.20 53.09 63.25 70.79 
11.51 16.05 19.24 22.30 26.22 29.11 15.70 21.88 26.23 30.40 35.74 39.68 

11.57 13.69 15.71 18.28 20.16 11.18 15.18 17.96 20.60 23.97 26.45 

6.80 9.05 10.59 12.06 13.91 15.27 8.69 11.55 13.53 15.40 17.76 19.50 

5.70 7.45 8.65 9.78 11.20 12.24 7.14 9.33 10.83 12.24 14.03 15.33 

493 635 7.32 8.23 9.38 10.21 6.08 7.84 9.03 10.16 11.57 12.60 

10 4.36 5.56 637 7.12 8.07 8.76 5.32 6.78 7.76 869 9.84 10.68 

11 3.93 496 565 629 7.10 7.68 4.75 599 682 7.60 8.57 9.28 

12 3.59 4.49 5.09 5.65 6.34 6.85 4.31 5.38 6.10 6.77 7.60 8.21 

13 3.31 4.11 464 5.13 5.74 6.18 3.95 489 5.52 6.11 684 7.36 

14 3.09 3.80 4.27 4.71 5.25 5.65 3.65 4.50 5.06 5.58 6.22 6.69 

15 2.89 3.54 3.96 436 4.85 5.20 3.41 4.17 4.67 5.14 5.71 6.13 

16 2.73 3.31 3.70 4.06 4.51 4.82 3.20 3.89 4.35 4.77 5.29 5.66 

17 2.58 3.12 3.48 3.81 4.21 4.51 3.02 3.65 4.07 445 4.93 5.27 

18 | 246 2.96 3.29 3.59 3.96 4.23 2.87 345 3.83 4.18 4.62 4.93 

19 2.35 2.81 3.12 340 3.74 3.99 2.73 3.27 3.62 3.95 4.35 4.64 

20 2.25 2.68 2.97 3.23 3.55 3.78 2.61 3.11 3.44 3.74 4.11 4.38 

21 2.16 2.57 2.83 3.08 3.38 3.59 2.50 2.97 3.28 3.56 3.91 4.15 

22 208 2.46 2.71 2.94 3.22 3.42 240 2.85 3.13 340 3.72 3.95 

23 201 2.37 2.61 2.82 3.08 3.27 2.32 2.73 3.00 3.25 3.55 3.77 

24 1.94 2.29 2.51 2.71 296 3.14 2.24 2.63 2.88 3.12 3.40 3.61 


@ 


25 188 2.21 242 261 285 3.01] 2.16 2.54 2.78 3.00 3.27 3.46 

26 183 2.14 2.34 2.52 2.74 2.90 2.10 2.45 2.68 2.89 3.14 3.33 

27 1.78 2.07 2.26 2.43 2.65 280] 2.04 2.37 2.59 2.79 3.03 3.20 

28 1.73 2.01 2.19 2.36 2.56 2.70] 1.98 230 2.51 2.70 293 3.09 

29 168 1.95 2.13 2.29 248 2.62 1.93 2.23 243 261 283 2.99 

: 30 1.64 190 207 222 240 253] 188 217 236 253 2.74 2.89 

- 40 1.34 1.52 1.64 175 187 1.96] 1.52 1.73 186 198 212 2.22 

50 116 1.30 1.39 1.47 1.56 1.63 130 146 1.56 1.65 1.76 1.84 

60 1.03 1.14 1.21 1.28 1.36 1.41 1.16 1.28 1.37 1.44 1.53 1.59 

y 70 -93 1.03 1.09 1.14 1.21 1.26 1.05 1.16 1.22 1.28 1.36 1.41 

80 86 94 1.00 104 1.10 1.14 96 106 1.12 117 123 1.27 

o 90 80 88 92 96 1.01 1.04 90 998 103 107 113 1.17 

7 100 75 82 £86 90 94 97 84 91 96 100 105 1.08 


7 The problem is to find n such that the probability is equal to B that 
[ w is less than p percent of o’, or in other words to find n such that 


P| w =p= Pl ean 
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But (n — 1)s2/o’ is distributed as chi-square with n — 1 degrees of 
freedom. Therefore, the following probability statement holds: 


PL < 24m] =. 


So the solution to the problem is the smallest integer n which satisfies 


2 


or 


> = - 
lor example, suppose it is desired to set a 95 percent confidence interval 
on o” with width less than 84 percent of o”, and with width coefficient 
of .99. Herel — a = .95, B = .99, 8’ = .9801, p/100 = .84 = q,, and 
the tables are entered at 1 — a = .95, 8” = .9801, and we get d.f. = .90. 
Son = 91. 
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ON A MULTICOMPARTMENT MIGRATION MODEL 
WITH CHRONIC FEEDING' 


Atvin D. Wicc1ns 


Hanford Laboratories Operation, 
General Electric Company, 
Richland, Washingion, U.S. A. 


1. SUMMARY 


In uptake and retention studies, the biosystem can often be thought 
of as comprising K distinct compartments with constant rates of passage 
of particles from any compartment to a different compartment. Before 
the theoretical uptake and retention curve for any compartment can 
be plotted and compared with the experimentally determined curve 
for the same compartment, it must be possible to estimate these rates 
from the data of the experiment. A mathematical model of such a 
system is deduced which is sufficiently general to comprehend both 
the chronic and acute feeding situations. A method is presented for 
estimating the migration rate constants. 

If \,; denotes the theoretical constant rate of migration of particles 
from region 7 to region j, and X;; denotes an estimate of \,; calculated 
from the experimental data, then it is shown that 


+++, # j, (1.1) 


where the meaning of the symbols occurring above is as follows: 

t, is the time, measured from the beginning of. the experiment, 
at which the vth observation of the system is made, v = 1, ---, 
n — 1 = 0); 
is the size, measured in appropriate units, of the ith compartment 
or region (for example, total volume in milliliters of blood, or 
total mass in grams of bone), and is here assumed to be constant 
in a first approximation; 

f. = f(t,), where f(t) (the feeding function) measures the total 


m 


1Work performed under Contract No. AT(45-1)-1350 between the Atomic Energy Commission 
and General Electric Company. 
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amount of radioactivity administered to the system up to time ¢; 
u* is the concentration of radioactivity in region k at time ¢, ; 
wy, is the element in row 7 and column / in the inverse of the matrix 


n-1 
dD |, 
v=1 


2. INTRODUCTION 


In Section 3 is derived the basic system of differential equations 
for the time-dependent transition probabilities p,(4) in the case in 
which a single particle is introduced into R, (Region 1). The (K — 1)- 
variate probability generating function for this case is presented, and 
with this as a starting point, the logarithm L,(s, , --+ , se 13 0 of 
the probability generating function for the chronie feeding case is 
derived. Using L, , it is shown that, except for the presence of a non- 
homogeneous term in the first equation, the expectations u,(t) = 
L[X,(t)] satisfy the same system of differential equations as that satis- 
fied by the p,(é). 

Section 4 contains some results on powers and inverses of cyclic 
matrices which occur in the development. A novel type of matrix 
multiplication is defined, and an isomorphism between two classes of 
matrices is proved. 

In Section 5 an explicit: procedure for estimating the rate constants 
\;; is presented. The procedure makes use of the basic system of 
differential equations mentioned above, and the estimates are expressed 
in terms of the total numbers of particles in the various regions 2, 
at times é, . 

A modified estimation procedure is presented in Section 6. The 
modification is necessitated by the fact that in practice the experimenter 
seldom observes total numbers of particles in any compartment at any 
time t. Rather he observes concentrations of particles at time t. This 
requires some additional knowledge of the relative sizes of the com- 
partments of the biosystem. 


3. GENERATING FUNCTIONS 
Suppose that a biological system consists of AK compartments or 
regions, R, , --- , Rx , and let a single particle be introduced into R, 
at time ¢ = 0. We assume that the behavior of the particle within 
the system is random and is described as by Feller [2] in terms of a 
vector-valued Markov process Y(t) = [¥,(0), , YeQ@]. Here 
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YY. =1 ér 0 according as the particle is or is not in R, at time t. 
If Y,(4) = 1, then necessarily all other Y;(¢) have the value 0, and we 
describe the situation by saying that the system is in state 7 (S,) at 
time t. We wish first to determine the transition probabilities 
p(t) = =1), t=1,---,K. (8.1) 

Thus p,(é) denotes the probability of transition of the system from 
S, to S,; during time ?. 

Let = ¥ j, be a system of constants and assume 
that during a small increment of time of length h the probability of 
transition from S, to S, is given by 


Then we have the system of difference equations 


pit +h) = + 


ini 
(i duh) + o(h), j 1, K. (3.3) 


From equation (3.3) and a passage to the limit as h — 0, we obtain 
the system of differential equations 


pi(t) = > — r;.pi(t) + (8.4) 


where 


K 

= 
kei 


We assume the system to be closed, hence we must have 


= 1. (3.5) 


It is well known that solutions of the system (3.4) are expressed in 
terms of lincar combinations of exponential functions. However, in 
most cases of interest one is less interested in obtaining explicit solutions 
for the transition probabilities than in obtaining some information, 
by way of estimates, about the rate constants \,; . In particular, 
biologists are often interested in determining, say, the rate of deposition 
of harmful radioactivity in the bone. Consequently our main efforts 
will be directed toward the estimation of the constants \,; . 
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Let f(t) be a monotone non-decreasing piecewise continuous function 
over the non-negative t-axis. The function f will have the interpretation 
of a feeding function which measures the total quantity of particles 
introduced into the system up to time ¢. The probability generating 
function of the system (3.4) is easily seen to be 


Suppose now that not one but m particles are introduced into R, at 
time’ = 0. if we assume that the particles behave independently within 
the system, then the generating function is 


Finally, let the time interval from 0 to ¢ be divided into N subintervals 
each of length At = N~'t, and suppose that at times j At, 7 = 0,1, ---, 
N — 1, there are A,f particles introduced into R, , where A;jf = 


f{G + 1) At] — f(j At). In this case the resulting probability generating 
function is 


N-1 K-1 K-1 Agf 
p(t — j ap + — j (3.6) 


Taking logarithms of both members of (3.6), we have 


N-1 
In Ga Sx-1 5 Ad) 


> A;f In {i pit — ay | + p(t — j aves} (3.7) 


7=0 


The right-hand member of (3.7) is seen to be a Riemann-Stieltjes sum 
approximating the integral 


In pit — + pit — ned df(r). 


Consequently, if we pass to the limit as N — © in equation (3.7), and 
denote the resulting left-hand member by In G,(s, , --- , Sx-1 ; ¢), we 
have the equation 


L,(s, In G,(s, 9 °° %, ; 4) 


= fim + - na} als). 88) 
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In equation (3.8) we interpret the right-hand member to be the logarithm 
of the probability generating function corresponding to the situation in 
which large numbers of particles are introduced into the biological 
system over an extended period of time, the so-called chronic feeding 
case. 

Now let X(t) = [X,(t), --+ , Xx(é)] be a vector-\alued stochastic 
process, the ith component of which, X,(1), denotes the number of 
particles in R,; at time ¢. From the manner in which equation (3.8) 
was derived it is easy to see that the right-hand member is the logarithm 
of the probability generating function of X(t). If we let u(t) = E[X(O] = 
[u,(¢), ux(t)], then 


= = [ = 1, 


Ju 


(3.9) 
= [pelt 2) ofr). 


The condition (3.5), together with equations (3.9), implies 


= Dad. 


Let us complete the definition of f by putting {/( = 0 for all t < 0. 
Suppose that / is differentiable almost everywhere with respect to 
Lebesgue measure, and let 4o(f) denote the set of all points at which 
the derivative of f exists. We now restrict ourselves to the set of all 
points te D(f). Assuming that we can differentiate under the integral 
sign, we have upon differentiating both members of (3.9) 


0 
i= 1,---,&, fe (3.10) 


Replacing pf by its expression from (3.4) and making use of (3.9) again, 
we have 


p(t) = > A, (0) + 2 + 


e=1,---,K. (3.11) 
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Thus, except for the presence of the non-homogeneous term, p,(0+)j' (0), 
the expectations y,(t) satisfy the same system of differential equations 
as that satisfied by the transition probabilities p,(Q. It is this fact 
which we shall use as the basis for estimating the constants A,; . 


4, SPECIAL MATRIX INVERSIONS 


We proceed now to derive some results which will be useful in the 
inversion of some matrices which will occur in the sequel. Let m1 be 
the class of block matrices & defined as follows, 


ell 


| 


where the &,, are sealars, and J denotes the unit matrix of fixed order p, 
and m and n are arbitrary positive integers. Let 9 denote the class 
of all matrices a, having scalar entries, 


| Ang 


where p and gq are arbitrary positive integers. With every & e St we can 
associate a unique element a@ ¢ IW as follows: Given that 


choose a;; = &,,% = 1, °°: ,m,j = 1, ---,n, so that 
Ain a, 

— = re 


It is obvious that by reversing the above procedure we can associate 
with each @ ¢ IN a unique element & e IM, so that there exists a one-to- 
one correspondence between the elements of SM and the elements of sm. 
We write & < a to express this correspondence. For any positive 
integer m define m = mp. Let & and B be in 9m where & is an m X p 
matrix and isa p X matrix. Suppose that a and Then 
it is easy to see that a8 — a8. For, 
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Bul 


> Baul > Binal 


Similarly, if and are both m X and a and + 8, then 


(Gm + Bmi)T + Bmn)T 


Bu + Bu 
+ Bi + B 


Using the symbol = to express an isomorphism between two classes, 
we have just proved 

Theorem 1.. NM = mM with respect to matrix multiplication and 
addition. 

Now lIct A, denote the (K — 1) X (K — 1) matrix 


eos 0 0 0) 
0 0 1 0 O 
L 0 0 0 1 0. 


wherein fori = 1, --- , K — 1, each element of the 7th row is minus 
one; for i = 1, the remainder of the matrix consists of ones down the 
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first diagonal below the principal diagonal and zeros elsewhere; for 
1<ic< Kk — 1, A, is the matrix obtained from A, by repeatedly 
permuting the row containing the minus ones with the row below it 
until the 7th row is reached. We now prove 

Theorem 2. Yori = 1, --- , K — 1, the matrix A, is cyclic with 
period K + 1 — 7. 

Proof. We must show that Af*'* = J, where J denotes the unit 
matrix of order K — 1. The procedure will be as follows. We shall 
first show that the characteristic polynomial of A, is the cyclotomic 
polynomial of order K — 1, 


2484 --- (4.1) 
We shall then appeal to the Cayley-Hamilton theorem, which states 
that every square matrix satisfies its characteristic equation, in order 
to show that A, is cyclic with period A. The results for A; with? > 1 
will follow directly from that for A, . 

That the characteristic polynomial of A, is the cyclotomic polynomial 
(4.1) follows immediately from the fact that A, is the companion matrix 
[1, p 308] of this polynomial. By the Cayley-Hamilton theorem we 
have 


(4.2) 
We solve this equation first for Af~', then for J: 
= -—(I + A, + + AT) (4.3) 
I= —(A, + Ai +--+ + At’) 
= + + = = Ar. (4.4) 


This proves the theorem fori = 1. Tor 1 < i < K — 1 write A, in 
the form of a block matrix: 


A. = 
U_ OV: 


where 7,_, is the unit matrix of order 7 — 1, 0 denotes the (i — 1) X 
(K — 2) null matrix, U denotes the (K — 7) X (¢ — 1) matrix 
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and V denotes the (K — 7) X (K — ¢) matrix 


—1 
0 1 

0 1 @ 


Note that V and the matrix A, have exactly the same form, differing 
only in their respective orders. This similarity of form will be exploited 
in the remainder of the proof. It is easily seen that the nth power 
of A; can be written in the form 


| ar 


where V" is the nth power of V and U, is determined as follows. Writing 
A‘*' = A,A? = A‘A, we have the two forms for U,,, : 


Uasi UT;_, + VU, = Tx_,U + VU, (4.5) 


Kquating the right-hand members of (4.5) and (4.6) and rearranging 
terms, we have 


VU, = — WOU. (4.7) 
Since V has the same form as A, , its characteristic polynomial is 
= | +é&+ +e, 


which is non-vanishing for = 1. It follows that the determinant 
— #0, hence that — V) is non-singular. Consequently 
we can write its inverse. Thus from (4.7) we obtain 

U, = — — (4.8) 
Since -V" is of order K — 7, V is cyclic with period K + 1 — 7, by the 
first part of the proof, hence 


V 


. (4.9) 
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Setting n = K + 1 — din (4.8) we have 


= (Ix-; — — Ix-,)U = 0, (4.10) 
where 0 denotes the (K — 7) X (¢ — 1) null matrix. Therefore 


This completes the proof. 
Corollary. A;' = Af‘ ,t=1,-->,K —1. 
5. ESTIMATION PROCEDURE 


The procedure for estimating the constants X,, is relatively straight- 
forward. It consists essentially of regarding the system (3.11) as a 
sysiem of algebraic equations in the unknowns \,; , of replacing the 
quantities u,(t) and y/(t), respectively, by observed values and dif- 
ference quotients constructed from the observed values of the random 
variables, and finally of writing a sufficient number of equations to 
solve the resulting system. Let 7’ e D(f) and choose a sequence [t, , 
v = 0,1, --- , n] of distinct time points on the set {D(f) 1 [0, T]} U 
{to} where 4) = 0, andt, = 7. Let the process X(t) = [X,(#), Xx(0)] 
be observed at each time ¢t, , and denote by u, = (u,,, °-- , Ux,) the 
observed value of X(t) att = t,. With the initial conditions p,(0+) = 1, 
p,(O+) = 0 > 1, let 


Ui 


vy-1 


Consider now equations (3.11). Due to condition (3.5) this system 
is dependent. We can obtain an independent system by eliminating 
one of the equations, and suppose for convenience that it is the last 
equation. Thus, the subscript j runs from 1 to K — 1. We now recast 
the system (3.11) as a system of algebraic equations in the unknowns 
\,; with the functions y;(t) as coefficients. At the same time we replace 
u(t,) by uy, , ui(t,) by v,, , and A,; by 4,,; , displaying each of the un- 
knowns X,,; in a column by itself, as is usually done in linear systems 
of equations. For a fixed value of v, the result will be a system of 
K — 1 algebraic equations in the K(K — 1) unknowns 4,,; . In vector 
and matrix form the system is partially written out as follows for vy = 1. 
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0 0 0 0 Uo, 0 0 
0 Uy 0 0 0 0 = 
eee 
Rix 
(5.3) 
UK-1,1 


The vertical lines in the matrix and the horizontal lines in the vector 
of X’s have been drawn to suggest a more or less natural division into 
blucks, for reasons which will become apparent. Notice that the first 
block has the form u,,A, where A, is a square matrix of order K — 1 
of the type considered in Section 4 above. Similarly the second block 
can be written u,,A,. In general, turi = 1, --- , K — 1, the 7th block 
is of the form u,,A,;. Finally the Ath block has the form ux,,Jx-; , 
where J_, is, as before, the unit matrix of order K — 1. We now let 
v assume each of the values 2, --- , n — 1, and adjoin to the system 
{5 3) above n — 2 additional systems of A — 1 equations, so that, 
in block matrix form, the Cumpicte system can be written 


r Un, Ag Ux-11Ax-1 

Rie Vii 
Aix VK- 
K-1,1_ (5.4) 
V1 

UK~-1,n-1J 


Abbreviating stilf further, we rewrite (5.4) in an obvious notation as 
follows 


BX =», (5.5) 


and fix our attention on the (n — 1)(K — 1) X K(K — 1) matrix B, 
where we assume (n — 1) > K. The matrix B can be written as 
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follows 
Uy Uns Ai 0 0 
0 0 
A, 0- 0 
=i i. (5.6) 
0 eee 0 


where @ is an element of the class 9m of matrices considered in Section 4. 
With the aid of the isomorphism which was established therein, we 
now define a new operation of matrix multiplication,” 


where N is a block matrix and wu is the image in MN of @ under the iso- 
morphism. By the right hand side of (5.7) we mean the usual row into 
column operation of matrix multiplication with the understanding that 
this is a row of scalars into a column of blocks, N;; , i = 1, «++ , K. 
A typical entry in the matrix M will be 


M,, = K. (5.8) 
The right-hand side of (5.7) will be meaningful if and only if the number 
of columns of u is equal to the number of rows of blocks of N. If u 
ism — 1 X K and N has K rows of blocks and K columns of blocks, 
then M will have n — 1 rows of blocks with K blocks in each row. 
Obviously this definition can be extended to the case in which N is 
“rectangular” with regard to the number of rows and columns of 
blocks, and to the case of right as well as left multiplication. It should 
be noted further that each of the sums in the right-hand member of 
(5.8) must be meaningful in order that the definition (5.7) be so. 

It will be useful to have a word to compare the internal structures 
of two matrices. Accordingly, if P and Q are both block matrices or 
are both matrices with scalar entries, we shall say that P and Q are 
of similar fabric. Otherwise we shall say that P and Q are of dissimilar 
fabric. In particular if P is a scalar matrix and Q a block matrix we 

3I am indebted to one of the ref who pointed out that this type of multiplication is handled 


by Roy and Sarhan [6]. It has subsequently come to my attention that a similar product, called the 
Kronecker product of matrices, is treated in Chapter III of Murnaghan [5]. 
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shall say that P is of finer fabric than Q, or equivalently that Q is of 
coarser fabric than P. It is recognized that there is some degree of 
ambiguity in this definition, since any scalar matrix can, by an arbitrary 
partitioning, be considered as a block matrix, and conversely. Never- 
theless in subsequent usage the meaning should be clear from the text. 
Equation (5.5) can now be rewritten isomorphically as 


© 
0 


Now multiply each member of equation (5.9) on the left by w’, 
the transpose of u: 


Ag, 


0 0 


In order that the right-hand member of (5.10) remain meaningful, 
the vector v must be thought of as partitioned into » — 1 subvectors 
each having K — 1 components. The multiplication then proceeds as 
defined in (5.7) with v taken to be a one-column block matrix. All 
of the remaining problems in determining the estimate X of the vector 
\ are connected with the inversion of the matrices occurring in the 
left-hand member of (5.10). It is known that the matrix u’u is non- 
Singular if there is at least one non-vanishing subdeterminant of u of 
order (K — 1). One would like, then, to prove some such notion as 
that embodied in the following vague statement: “For any integer 
n > 1, the set of all X(#) for which every subdeterminant of u of order 
(kX — 1) vanishes is a sct of measure zero,” for some appropriately 
determined probability measure on the sample space of vector functions 
X(t). So far the author has been unable to construct any such proof. 
We shall simply assume that the experiment has yielded values such 
that there is indeed at least one non-vanishing subdeterminant of u 


of order (K — 1). In this case equation (5.10) becomes, upon multi- 
plication by 


A 0 

A= @v. (5.11) 
0 


A, 0 0 
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Finally the inversion of the block matrix in the left-hand member 


of (511) follows immediately upon application of the anand to 
Theorem 2 above, and we have 


=~ * om 


L 0 O O 


as our estimate of the vector A. 


6. MODIFIED PROCEDURE 


In its present form the above procedure is inapplicable to many 
biological problems. The reason is that the random variables X ,(t) 
measure the total quantity of tracer substance in the regions R; at time t, 
whereas the biologist often measures the concentration at time t. In 
order to make the above procedure useful to the biologist, the random 
variables must be modified. 

By an appropriate change of scale—the same change applied to 
each X ,(t)—the components X,(t) of the vector stochastic process X (t) 
may now be interpreted as total numbers of microcuries, rather than 
as total numbers of radioactive particles, in the regions R,; at time t. 
More generally, the interpretation may be made in terms of whatever 
number is used in the numerator in the calculation of a concentration 
ratio. This, however, is not the case with the denominator. For 
7 = 1, --- , K let m,; be a quantity which measures the size of R; . 
For example, m; may represent total blood volume measured in mil- 
liliters, or total mass of bone measured in grams, or total biomass of 
the ith trophic level in an ecosystem. The concentration of particles 


in R; at time ¢ can now be represented by a new random variable Z, (2) 
defined by 


Z(t) = X ()/m; (6.1) 
so that the new vector-valued stochastic process Z(#) is now written 


Z(t) = (6.2) 


Let ¢;(t) be defined by 


Fori = 1, --- , K, let each m, be assumed to be a constant independent 
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of time Then the system (3.11) becomes 


migi(t) = — (mA; 
K 
+ (mr, j = 1, K. (6.4) 
Let new quantities u*, , v* and 7,; be defined by 
= mut , = me, = His - (6.5) 


Let A and m respectively denote the matrices 


A, 0 0 
(6.6) 
0 0 
m, 0 
n= (6.7) 


and let u*, 7 and v* denote the matrix and vectors corresponding re- 
spectively to u, } and v. In fact we have the relations 


u=utm, 


v* = M*y* (6.8) 


|---| 
where m* is the (K — 1) X (K — 1) matrix obtained from m by deleting 
the last row and column therefrom, 
0 
m* = (6.9) 

and M* is a square block matrix containing the block m* repeated n 
times down the main diagonal and blocks of zeros elsewhere. 


By a straightforward substitution, the estimating equation (5.9) 
now becomes 


A@®m'@ = M*r*. (6.10) 


Since m is diagonal and A is a diagonal of blocks, m and A “commute,” 
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hence (6.10) reduces to 


u* @ Aq = (6.11) 
Proceeding as in section 5 we have 
4 = A*® @ (6.12) 


Multiplying equation (6.12) by m™* and making use.of the second of 
the relations (6.8) we have finally 


X= m'@ M*v*. (6.13) 
From (6.13) we obtain the following explicit expressions for the 
rate constant estimates 4,; . 


n-1 


K 
Ki; = Wik (ut +1 u}, -1) 


i kel 


where w,, is the element in the intersection of row 7 and column k of 
the inverse of the matrix 


a-l 
and =1 if j=1, 6,=0 if 

Note that the right-hand member of (6.13) contains a sequence of 
matrix multiplications some of which are of the usual type, with the 
remainder being of the type defined in Section 5. At this point the 
question of the associativity of this mixed type of matrix product comes 
into being. It can easily be shown that the associative law is satisfied, 
with the following proviso: If, in performing the indicated matrix 
multiplications in an arbitrary order, two matrices of similar fabric 
are brought together with the symbol ® between them, then this 
symbol may be dropped, following which the usual rules of matrix 
multiplication may be applied. 

Use of the estimate in equation (6.13) requires of the biologist some 
additional information. He must know, or be able to estimate, the 
size of each of the compartments in the biological system under con- 
sideration. 

So far no properties, either large- or small-sample, have been deduced 
for the estimate X, our main efforts having been concentrated on simply 
obtaining an estimate. However, further developmental work on this 
problem is continuing, and it is expected that eventually the large- 
sample properties of the estimate will be published. In this connection 
one might suspect that the more finely the interval [0, T] is divided, 
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the greater will be the “accuracy,” in some as yet undefined sense, 
of the estimate. This is closely related to the following question which 
may be of interest to the experimenter: “Should the process be observed 
over a total period twice as long, or should it be observed for the same 
length of time, but observed twice as frequently?” Present indications 
seem to favor the la. 2r course. In a numerical example kindly provided 
by one of the referees, in which exact values of the exponential solutions 
of the system (3.11) were used as “observations,” the true values of 
the parameters A,; were recovered, using the methods of this paper, 
with about a three percent positive bias, for K = 2, and t = Q, 1, 2, 3. 
To gain some feeling for the above question, the estimates of A,; were 
recomputed for times ¢ = 0, 1, 2, 3, 4, 5, 6 and found to be indistin- 
guishable from the first estimates as far out as the fifth significant 
figure. Next the estimates were recomputed for times t = 0, 0.5, 1, 
1.5, 2, 2.5, 3. In this case the positive bias of the estimates was reduced 
to about one percent. 

Though this falls short of a rigorous proof, it does lend support 
tu the feeling that there is more to be gained by sampling twice as 
often than by sampling over twice as long a time. Such an approach 
has been suggested, for example, by Mann [4] in estimating the param- 
eters oceurring in the Brownian motion and Ornstein-Uhlenbeck pro- 
cesses. 

It should be noted that although the investigation has been cast 
within the framework of a biological problem, its applicability is by 
no means restricted to the field of biology. It should be clear that the 
above procedure is applicable to any system which is evolving in time, 
and which can be described by means of a system of linear equations 
with constant coefficients. 
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AN APPLICATION OF REGRESSION TO 
FREQUENCY GRADUATION’ 


Rosert J. BUEHLER 


Statistical Laboratory, Iowa State University, Ames, Iowa, U.S. A. 


1. INTRODUCTION 


From observed values z grouped in a frequency distribution it is 
desired to determine the parameters 6, of a function ¢(x) = B,¢,(z) + 
-++ ++ Babm(Z) in such a way that ¢ has approximately some specified 
frequency distribution g(¢), for example a normal distribution. In 
this way the observed frequency distribution may be graduated by a 
smooth function. Explicit formulas for efficient large-sample estimates 
of the parameters 8; are given, and some connections with bio-assay 
problems are indicated. 


2. RELATED EARLIER WORK 


If observed values z arise from a density f(z) and if g(y) is some 
standard density, then there always exists a transformation y = .¢(z) 
such that ¢@ has the density g(¢). The practical problems lie in the 
choice of the functional forms of g and ¢ and in specifying how the 
observations may be used to estimate parameters occurring in ¢. This 
approach to frequency graduation was introduced by Edgeworth [1898] 
and was called by him the “method of translation.” Edgeworth took 
the inverse function z = ¢ '(y) to be a cubic function of a normal 
variate y and used the method of moments to determine the four 
constants of the cubic (for a brief description, see Pretorius [1930] 
p. 116). 

When g(y) is a normal density and when ¢ itself, rather than ¢™’, 
is a polynomial, a series expansion method due to Cornish and Fisher 
[1937] applies, and observed or theoretical cumulants determine approxi- 
mately the distribution and percentile points of z. More recently, 
Johnson [1949] has taken g(y) to be normal and ¢ to be 


y = $2) = p+ 


1This work was supported by the Office of Ordnance Research, U.S. Army, under Contract No. 
DA-11-022-ORD-2732 
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in which p, 5, —, and \ are adjustable parameters. Some general theory 
is developed by Johnson for arbitrary y, and the problem of fitting 
by the method of moments is worked out for three special cases called 
S,, Ss, and Sy arising from 

= log u, log [u/(1 — u)], and logu+ Vw + 1). 
The Cornish-Fisher and Johnson theories are described by Kendall 
and Stuart [1958], Sections 6.25 and 6.27. Other less directly related 
similar work is discussed by Pretorius [1930] and by Johnson [1949]. 

Hammersley and Morton [1954] work with grouped data, take the 
standard density g(y) to be arbitrary, and use the method of weighted 
least squares to determine parameters a and 8 so that the density 
Bg(a + Bx) best fits the observed distribution. The present paper 
exploits the same method in greater generality by replacing the trans- 
formation ¢(z) = a + Bx by the general form 


in which the ¢,(z) are arbitrarily specified functions. 

The relation to the quantal response bio-assay problem recently 
studied by White and Graca [1958] may also be noted. If a proportion 
r/n of a population responds to a given stimulus within time ¢t, and 
if y is a response metameter (e.g., logit r/n), one may wish to estimate 
parameters 6, of a function y(t) = >> 6,¢,(t) from observed proportions 
r,/n at fixed times ¢; . ‘Then there is a formal correspondence between: 


response metameter y and percentile point y, 


fixed times t; and fixed class boundaries 2, , 
proportion r,/n and proportion of observations 
less than 2; . 


White and Graca’s problem actually is more general in that y depends 
also on m levels of a dose metameter. But if m is put equal to 1, then 
their estimates derived by minimum modified chi-square are formally 
the same as those derived by the regression analysis of Hammersley 
and Morton. Although White and Graca’s numerical example includes 
only linear dependence on ¢, their analysis is sufficiently general to 
treat y = >. 0,¢,(t). (In comparing White and Graca’s work, note 
that their g, g’, v, w, y, z, @ correspond to our G, g, nv, nw, y, 2, B.) 


3. THE ESTIMATING EQUATIONS AND THE 
GRADUATED FREQUENCIES 


If z has density f(z), then the function y = ¢(x) having a specified 
density g(y) is obtained by integrating the differential equation 
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the parameters 6, of the density (5) will be estimated from n 


ervations x which have been grouped into k class intervals: 


Observed frequency 


Class interval absolute relative 


<2<23 na = N2/n 
Tei = O Nk Pe = n/n 


Let us denote cumulative relative frequencies by s; : 
8 j=1,--:,k (6) 
and denote by y; the corresponding percentile points of the density g(y): 


aus 


= Gy) =] (7) 

Now for any given density (5) and for any fixed class intervals the 
observed quantities p, , --- , DP, will have a multinomial distribution, 
and with increasing sample size the multinomial variates p; are known 
to converge in probability to their expectations. It follows that, if 
the true distribution of x has the assumed form, then y; converges in 
probability to = forj = 1,---,k-—1. 

For large samples, therefore, we may consider the regression of the 
i — 1 variates y; on the fixed class boundaries z,; in order to obtain 
asymptotically unbiased estimates of the m parameters 8, . The effi- 
ciency of the ordinary least squares regression technique rests on the 
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assumptions of uncorrelated and homoscedastic errors: these assumptions 
are clearly not satisfied by the y’s, in view of their relation to the cor- 
related multinomial variates p; . But the covariance structure of the 
y’s can de determined,—or rather, can be estimated—so that the method 
of weighted least squares of Aitken [1935] can be used to obtain asymp- 
totically efficient estimates. 

The best estimate of the vector 8 ‘= (8, , «++ , Bn)’ (in the Gauss- 
Markoff sense) is given by 


8 = (X’'WX) 'X’Wy (8) 
in which 
Y= (% Yen)’, (X’),; = ¢.(z,) (9) 


and W is the inverse of the covariance matrix of the y’s. The matrix 
W is not known, but asymptotically efficient estimates are obtained by 
substituting the asymptotic form (see Hammersley and Morton [1954] 
or appendix) 


0 0 0 
Vv. O 0 
Ww = (10) 
(symm.) We-2 
where 
| 
= iy = >--,&— 4), (11) 
P; 
2, = gly,) G=1,--,k-D. (13) 


The graduated frequencies are given by 
a; = n{G(9,) — G@-)} where (14) 


There is an advantage here over Edgeworth’s method in that 9; is 
found from an explicit rather than an implicit formula. 


4. CHI-SQUARE AND F TESTS 


The chi-square test is appropriate for testing the agreement between 
the observed frequencies n; = np; and the graduated frequencies 7; . 
If there are k cells and m parameters the quantity 
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k =~ \2 
(15) 
i=l nN; 

has approximately a chi-square distribution with (k — m — 1) degrees 
of freedom. Of course the distribution theory rests on the assumption 
(or null hypothesis) that f has in fact a density of the form (5). Now, 
I anticipate that the form (5) will rarely be justifiable on the basis 
of some physical model; nevertheless the value of chi-square can serve 
as a rough guide. 

Another test, based on the F statistic, is appropriate for determining 
whether any particular 8, say the’ last one, 8, , may reasonably be 
put equal to zero. Let ¥ and §, be respectively the estimates for the 
full model and the restricted model in which £,, = 0. The test statistic 
is the ratio of weighted sums of squares, 


So = (y — $o)/Wy — fo) (16) 


which is distributed as F with 1 and k — m — 1 degrees of freedom under 
the null hypothesis that the restricted model is the true one, and which 
is sensitive to departures from the null hypothesis arising from a full 
model with a nonzero 8,,. This F test is the oue that commonly serves 
as a guide in choosing the appropriate number of parameters. Equation 
(16) is not a particularly suitable form for computations, owing to 
the necessity of making an independent fit of the restricted model. 
If the number of parameters were one of the main considerations, then 
the more sophisticated method of orthogonal functions would be superior 
to straightforward matrix multiplication and inversion. The work of 
Aitken [1933] applies when ¢(x) is a polynomial and the class intervals 
are equal. Unfortunately most of the advantage of this method is 
lost by virtue of the random nature of W which necessitates redeter- 
mination of the orthogonal functions for each new set of observations. 


5. CHI-SQUARE MINIMIZATION 


White and Graca [1958] use (15) as as’ “ting point, replace it by 
“modified chi-square” which has n, rather 4, in the denominator, 
and obtain estimating equations agrecin, ‘’ those given above by 
minimizing the resulting expression, using, 0: cc _3e, further asymptotic 
approximations. Since the two procedures (chi-square and regression) 
lead to the same results, the regression estimates are asymptotically 
minimum chi-square estimates. This is not surprising in view of the 
known large-sample equivalence of minimum modified chi-square and 
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maximum likelihood (Cramér [1946], Sections 30.3 and 33.4). White 
and Graca’s calculations further show that (15) is asymptotically equal 
to n times the weighted sum of squares of deviations, i.e., 


> (nj = nly — §)/W(y — §), asymptotically. (17) 


6. DIFFICULTIES ARISING FROM MULTIVALUED 
TRANSFORMATIONS 


An unfortunate difficulty with expressions of the form ¢(z) = 
>> 8¢.(x) is the possibility of a multivalued inverse function $7’, 
that is, the possibility of @ having maxima or minima. [or example, 
if g is normal and if ¢ is quadratic, then the range of ¢ is from its minimum 
value to © (or from — ~ to its maximum value), rather than the whole 
real axis, as is theoretically required by the range of the normal variate. 
A quadratic function might nevertheless provide a satisfactory gradua- 
tion of observed data if it reversed slope far out on the tail of the distri- 
bution and therefore was monotone over the range of interest. With 
¢ a cubic function one will or will not have a single-valued ¢~* depending 
on the nonexistence of existence of real real roots of the quadratic 
function ¢’. In the numerical example of Section’7, ¢ has the desired 
positive slope over the range of interest, but eventually has negative 
slope. Similar difficulties were encountered by Edgeworth in using a 
cubic function for 


7. NUMERICAL EXAMPLE: CUBIC TRANSFORMATION 
TO NORMALITY 


For an example we have chosen the oft-graduated bean data of 
Johannsen (see Pretorius [1930] p. 157, Johnson [1949] p. 172, Kendall 
and Stuart [1958] p. 20) and have attempted to represent the distri- 
bution of a cubic function (m = 4 parameters) of the observed breadth 
of beans by a normal distribution. The total number of beans measured 
was n = 9440; the number of class intervals for breadth was k = 12. 
The original class intervals were all of length 0.25 millimeters with 
boundaries ranging from 6.25 to 9.25 millimeters. Since any linear 
transformation of the class boundaries leaves unchanged the graduated 
frequencies, it was convenient to replace the actual values 6.25, 6.50, --- , 
9.25 by —6, —5, --- , +6. Table 1 shows the calculation of the matrix 
W from the observed frequencies. 

Taking ¢(z) = 6, + Box + Bx” + 6,2* and taking x = —5, —4, 
--+ , + 5 (the values +6 do not enter; the calculations would not be 
changed if they were replaced by + ~) leads to the matrix 
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4 
-4 -3 -2 -1012 3 4 
22 6 9 4 1014 9 16 2% 
\-125 -64 -27 -8 -1 0 1 8 27 64 125, 


l'rom X’ and the last two columns of Table 1 one obtains 


0.95406 0.83558 2.5221 4.32591 
X’WX =» 4.1751 5.7359 29.630 
33.243 54.764 
(symm.) 424.69 


The inverse is found to be 


[1.4592 —0.28928 -—0.088317 0.016708 | 
(X’Wx)' = 0.56556 —0.019665 —0.033976 |. 
0.046259 —0.0036937 
(symm.) 0.00503 13_ 
From X’, W, and the normal deviates y of Table 1 we obtain 
—0.017926 | 
2.4759 
3.3333 
1884 J 
The product. of the last two expressions gives the estimates 
T—0.72192 
= = 0.69962 
0.037488 
—0.0019207 


From these values were obtained the graduated percentile points 9; 
(using ¥ = X@) shown in the second column of Table 2. For comparison, 
in the third column of Table 2 are shown the graduated values which 
result from applications of the usual unweighted least squares method. 
‘It is seen from the columns of deviations y; — 9, that weighted least 
squares gives a better fit near the center of the distribution, putting 
less emphasis on the fitting of y in the tails of the distribution. This 
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TABLE 2 
OBSERVED AND GRADUATED PERCENTILE PoInTs 
Graduated values 9; Deviations 7; — 4; 
Least squares cubic Ordinary Least squares cubic 
Observed transformation to normal fit transformation to 
normal normality using two normality 
45 Weighted | Unweighted | (u, 2?) only | Weighted | Unweighted 
—3.0428 —3.325 —4.341 —(). 298 —0.012 
—2.971 —2.7977 —2.975 —3.606 —0.173 +0.004 
—2.361 —2.4315 —2.480 —2.870 +0.071 +0.119 
—1.925 — 1.9559 —1.939 —2.135 +0.031 +0.014 
— 1.3835 —1.3821 — 1.333 —1.399 —0.0014 —0.051 
—0.73475 | —0.7219 —0.667 —0.664 —0.0129 —0.119 
0.01116 0.0133 0.053 0.072 —0.0021 —0.042 
0.82362 0.8119 0.819 0.808 +0.0117 +0.005 
1.6647 1.6625 1.624 1.543 +0 .0022 +0.041 
2.536 2.5535 2.461 2.279 —0.018 +0.075 
3.274 3.4733 3.323 3.014 —0.199 —0.049 . 


was to be expected. Table 2 also shows (in the fourth column) the 
result of a conventional two-parameter fit, which is not intended to 
compete with the four-parameter fits, but only to show the extent 
of the non-normality of the data. 

From the graduated percentile points we obtain via a table of the 
normal distribution the graduated frequencies given in the second and 
third columns of Table 3. For comparison, two earlier graduations 
are included: one from Pretorius [1930] based on a Pearson Type IV 
curve, and one from Johnson [1949] based on his translation system Sy . 
Each of the four systems tabulated uses four parameters. Following 
the earlier workers, we have pooled cell frequencies as indicated by 
the braces for the calculation of x’ (note that the cells were not pooled 
prior to the fitting of the parameters). The original twelve cells are 
thus reduced to ten. Thus there remain 10 — 4 — 1 = 5 degrees 
of freedom for x’. As measured by x’, both the earlier graduations 
are superior to the cubic transformation to normality used here. The 
suitability of transformations other than cubic or base distributions 
other than normal has not been investigated. Comparison of the 
second and third columns clearly shows the superiority of the weighted 
over the unweighted least squares method, 
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TABLE 3 
OBSERVED AND GRADUATED FREQUENCIES 
Frequencies graduated by 
least squares cubic trans- Frequencies graduated by 
Observed . formation to normality 

frequency System S y of Pearson 
nj; weighted unweighted Johnson [1949] Type IV* 

4 11.06 4.15 

10 13.23 9.68 13.8 13.3 

72 46.69 48.19 53.2 49.9 

170 167.3 185.8 182.2 177.2 

530 561.2 614.3 557.8 557.9 

1397 1420.7 1521.7 1394.2 1413.3 

2579 2549.9 2535.2 2507 .0 2530.5 

2742 2702.4 2571.2 2757 .0 2732.5 

1483 1512.5 1456.4 1544.4 1515.4 

400 404.7 427.7 381.5 393.6 

48 f47.91 61.17 41.5 48.6 

5 \ 2.43 \ 4.15 2.4 3.0 
x?** 21.94 51.57 17.47 14.36 

d.f. 5 5 5 5 


*Taken from Pretorius [1930] p. 218. 
**The pooling indicated by the braces follows that of Pretoriue and Johnson. 


8. SUMMARY AND CRITIQUE 


The method of graduating frequency distributions by variate trans- 
formation was introduced by Edgeworth [1898], and has since been 
modified and extended by various workers. We have considered trans- 
formations having the form y = ¢(z) = >> 8.¢,(x) in which z is the 
observed variate and y is some standard variate. A general method 
of fitting has been described. Among the advantages of the present 
scheme may be listed: 


(a) The theory is flexible in that the functions ¢, , --- ,¢,, are arbitrary, 
as is also the distribution of the standard variate y. A single 
straightforward method of estirnaating the parameters applies to 
all cases. 

(b) The method of fitting is asymptotically efficient with respect to 
estimation of parameters and asymptotically minimizes chi-square. 

(c) Graduated cell frequencies are easily obtained from the estimated 
parameters. 
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Some disadvantages are: 

(a) Difficulties may arise from non-monotonicity of the transforma- 
tion ¢. 

(b) The moments of the fitted distribution are difficult to obtain 
analytically. 

(c) In many problems it may be difficult to justify the class of trans- 
formations by a physical model. 


The estimating equations follow from the regression analysis o1 
Hammersley and Morton [1954] or from the minimum modified chi- 
square analysis of White and Graca [1958]. The present paper shows 
the connection between the bio-assay work of White and Graca and 
the frequency graduation work of Edgeworth, Johnson, Hammersley 
and Morton, and others. 

A numerical example based on a cubic transformation to normality 
has been presented. 
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APPENDIX 
DETAILS OF THE REGRESSION ANALYSIS 


Let the covariance matrices of y, s, and p be denoted by Vv, V., 
and V, and their inverses by W, W, , and W,. From Equation (6), 


1 000--- 
p= Ls where L = (18) 
0O-110--- 


and it follows that 
V, =LV,.L’ and W, = L’W,L. (19) 


To relate W and W, onc has, to the first order, by taking statistical 
differentials of (7), 


Cov (s; , = glygly;) Cov (y; , y;) = 22; Cov , yi). 
Thus 
V, = ZVZ, W = ZW,.Z where Z = diag (2, , ---, 2-1). (21) 
Combining these results gives 


W = ZL’W,LZ. (22) 
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Now np is a vector of multinomiin! variates, and it is known (see for 


exaraple Kendall and Stuart [1958], p. 356) that the main diagonal 


of W, is 
41), +1), ...,9(— 41) 


and that all other elements are equal to n/m, where 7, , 72, --- are the 
multinomial expectations Kp, , Ep, , --- . The desired equations 
(10), (11), (12) now follow by straightforward substitution in (22), 
with the further large-sample approximation of replacing the expecta- 
tions z; by the corresponding observations p, . 
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CRITICAL VALUES FOR DUNCAN’S NEW 
MULTIPLE RANGE TEST 


H. Leon Harter 


Aeronautical Research Laboratories 
Wright-Patterson Air Force Base, Ohio, U.S. A. 


SUMMARY 


David B. Duncan [2] has formulated a new multiple range test 
making use of special protection levels based upon degrees of freedom. 
Duncan [Tables II and III] has also tabulated the critical values (sig- 
nificant studentized ranges) for 5 percent and 1 percent level new 
multiple range tests, based upon tables by Pearson and Hartley [8] 
and by Beyer [1]. Unfortunately, there are sizable errors in some of the 
published critical values. This fact was discovered and reported by 
the author [4], who instigated the computation at Wright-Patterson Air 
Force Base of more accurate tables of the probability integrals of 
the range and of the studentized range than those published by Pearson 
and Hartley [7, 8]. This extensive computing project, of which one of 
the primary objectives was the determination of more accurate critical 
values for Duncan’s test, has now been completed. The purpose of 
this paper is to report critical values (to four significant figures) which 
have been found by inverse interpolation in the new table of the prob- 
ability integral of the studentized range. Included are corrected tables 
for significance levels a = 0.05, 0.01 and new tables for significance levels 
a = 0.10, 0.005, 0.001—all with sample sizes n = 2(1)20(2)40(10)100 
and degrees of freedom v = 1(1)20, 24, 30, 40, 60, 120, ~. 


INTRODUCTION 


Multiple range tests are used for testing the significance of the range 
of p successive values out of an ordered arrangement of m means of 
samples of size N, where p = 2, --- , m. First one tests the significance 
of the range of all m means by comparing it with the critical range for 
the desired level! of significance. If the range of all m means is found 
to be significant, one next tests the significance of the range of (m — 1) 
successive means, omitting first the largest and then the smallest (or 
vice versa—order is unimportant); if either of these tests on (m — 1) 
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means shows significance, one then proceeds with tests on (m — 2) 
successive means, and so on until no further groups are found to have 
significant ranges. Whenever the range of any group is found to be 
non-significant, one concludes that the entire group has come from a 
homogeneous source, and no test is made on the range of any subgroup 
of that group. Multiple range tests differ from fixed range tests in 
that the critical range of p means usually decreases as p decreases, 
rather than remaining constant. 

The new multiple range test proposed by Duncan [2] makes use of 
special protection levels based upon degrees of freedom. Let 72.4 = 
1 — a be the protection level for testing the significance of a difference 
between two means; that is, the probability that a significant difference 
between sample means will not be found if the population means are 
equal. Duncan reasons that one has (p — 1) degrees of freedom for 
testing p means, and hence one may make (p — 1) independent tests, 
each with protection level y.,, . Hence the joint protection level is 


= = (1 — (1) 


that is, the probability that one finds no significant differences in making 
(p — 1) independent tests, each at protection level 2,2 , is v3.4 , under 
the hypothesis that all p population means are equal. 


CRITICAL VALUES FOR DUNCAN’S TEST 


On the basis of protection levels y,,. given by (1) for tests on p 
means, Duncan [2, Tables IT and III] has tabulated the factor Q(p, », a) 
by. which the standard error of the mean must be multiplied in order 
to obtain the critical range for Duncan’s new multiple range test, for 
a = 0.05, 0.01. In the sequel, this factor Q(p, v, a) will be called the 
critical value or the significant studentized range for Duncan’s test. 

As mentioned earlier, Duncan’s tables of significant studentized 
ranges are based upon tables by Pearson and Hartley [8] and by Beyer 
[i]. The tabular values for 2 < p < 20 and 10 < » < @ were obtained 
by inverse interpolation in the Pearson-Hartley tables of the probability 
integral of the studentized range, while the remainder of the values 
were computed by Beyer, using new methods. The Pearson-Hartley 
tables of the probability integral ,P,(Q) of the studentized range, with 
v degrees of freedom for the independent estimate s’ of population 
variance, are based upon their earlier tables of the probability integral 
P.(Q) of the range of n observations from a normal population. To 
correct for finite degrees of freedom, they use the relation 


P(Q) = PQ) + + ¥*b,(Q). (2) 
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The tables give values (to four, two and one decimal places, respectively) 
of P,(Q), a,(Q) and b,(Q) for Q = 0.00(0.25)6.50 and n = 3(1)20, with 
the observation that the results are somewhat inaccurate for small 
values of v(<10) and large values of Q(>6). Actually, the tables are 
inaccurate not only for y < 10, but also for values of » up to about 20, 
and the inaccuracy for high values of Q is much greater than was 
anticipated. The inaccuracies in the Pearson-Hartley: tables, which 
were due to the limitations of formula (2), in turn caused errors in the 
published critical values for Duncan’s test. Beyer was aware of the 
difficulty for »y < 10, and attempted to correct it by adding a term of 
the form v~*c,(Q) to the right-hand side of (2). This alleviated the 
difficulty to some extent, but did not remove it, and nothing at all was 
done to correct the inaccuracies for » > 10. Having first become 
aware of this situation during the course of an investigation of the 
relation between error rates and sample sizes of multiple comparisons 
tests based on the range (see reference [3]), the author [4] reported it 
in @ paper, presented to the American Statistical Association, which 
included an outline of plans for the computation of more accurate 
‘ables. 


COMPUTATION OF THE TABLE 


The computation of more accurate critical values for Duncan’s 
test required the computation of a more accurate table of the prob- 
ability integral of the studentized range, and this in turn required the 
computation of a more accurate table of the probability integral of 
the range. Dr. Gertrude Blanch gave invaluable assistance in the 
numerical analysis. Donald 8. Clemm programmed the computation 
of the probability integrals of the range and of the studentized range 
for the Univac Scientific (ERA 1103) computer. Eugene H. Guthrie 
programmed for the ERA 1103A the inverse interpolation necessary 
to obtain the critical values for Duncan’s test. 

The methods of computation of the probability integrals of the range 
and of the studentized range, together with voluminous tables, have 
been reported by Harter and Clemm [5] and by Harter, Clemm and 
Guthrie [6], and will not be repeated here. The method of inverse 
interpolation employed, an iterative one suggested by Major John V. 
Armitage, involves the following steps: 

1. In the table of the probability integral of the studentized range for 

n = pand the desired value of », find the two successive probabilities, 

yo and y, , between which the required protection level P = y,.4 7 

(1 -- a)? "lies. Call the two corresponding arguments (studentized 
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ranges) Z and z, , respectively. The required studentized range 
Q = R(p, v, yp.) Will lie between z» and 2, . 

2. Compute the tolerance T for P corresponding to a tolerance 5 X 10*~* 
for Q by means of the equation T = (AP/AQ) X 5 X 10°~*, where 
AP = y, — yo, AQ = x, — 2X and u is the number of digits before 
the decimal point in numbers between z, and 2, . 

8. Perform linear inverse interpolation to find an approximation z to 

the required R(p, v, y,,.), using the relation 


xz = — 2o)(P — yo)/(y: — Yo)] + 


4. Perform direct interpolation, using Aitken’s method with a tolerance 
of 5 X 10°’ and with provision for up to 16-point interpolation if 
the tolerance is not met for fewer points, to find the probability y 
corresponding to the value z of the studentized range. 

5. Compare the result y of step (4) with the required probability P, 
using the tolerance 7 computed in step (2): 

a. If | y — P| < T, stop and set R(p, v, v5.2) = 2. 

b. If (y — P) > T, replace y, by y and z, by z, then repeat the - 

process, starting with step (3). 
ce. If (y — P) < —T, replace y by y and zp by z, then repeat the 
process, starting with step (3). 

Once R(p, v, ¥p.2) has been found, the critical value Q(p, v, a) for 
Duncan’s test is determined as follows: Q(p, v, a) = R(p, v, Y».2) for 
p = 2 and Q(p, v, a) = max [R(p, », — 1, », a)] for p > 2. 
The results are given in Table 1. 

Values for y = @, obtained by inverse interpolation in the table of 
the probability integral of the range, are included for convenience in ° 
interpolation (linear harmonic »-wise interpclation is recommended). 


ACCURACY OF THE TABLE 


The table of the probability integral of the studentized range, on 
which the table of critical values for Duncan’s test is based, is accurate 
to within a unit in the sixth decimal place (except for values of the 
probability greater than 0.999995, which are given as 1.00000), and 
the interval is small enough to make interpolation possible. The 
tolerance for the direct interpolation was set at 5 X 107’ so that the 
interpolation error would not add appreciably to the error already 
present, and hence the interpolated values are substantially as accurate 
as the values in the input table. Inverse interpolation is, of course, 
not as accurate as direct interpolation, the error being AQ/AP times 
as great for inverse interpolation as for direct interpolation. Thus the 
tolerance for P was found by multiplying the tolerance for Q(5 X 10*~*) 
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by 1/(AQ/AP) = AP/AQ. Since u is defined as the number of digits 
before the decimal point in the studentized range interval under con- 
sideration, this would guarantee that the error in Q would not exceed 
5 units in the fifth significant digit if the ratio of the change in P to 
the change in Q were constant throughout the interval under considera- 
tion. This condition (P piecewise linear in Q) is obviously not satisfied 
in practice, but as long as the weaker condition 


max [AP)/AQ, , AP,/AQ,] < 2 AP/AQ, 


where AP, = | y — y,; | and AQ; = | xz — x, | (¢ = 0, 1) is satisfied, the 
error in Q will not exceed a unit in the fourth significant digit. This 
weaker condition is in fact satisfied, and hence it can be stated that 
the error in the critical values for Duncan’s test, which are given in 
Table 1, does not exceed a unit in the fourth and last significant digit. 
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154 NOTE: On an Alternative Method of Computing 
Tukey’s Statistic for the Latin 
Square Model 


Joun K. ABRAHAM! 
University of California, Berkeley, California, U. S. A. 
1, INTRODUCTION 


In order to test for additivity of factors a statistic first proposed 
by J. W. Tukey [1] is often used. The purpose of this note is to present 


‘an alternative method of computing this statistic, and to apply it to 


the example given in [2]. 
2. DEFINITION OF TUKEY’S STATISTIC 
Denote the observations by y;;, . Then under the assumption of 
additivity, 
Yin +7 + Gin 


(t, j, k = 1, 2, --+ n) where yn, a; , 8; and y, are parameters for the 
mean, row, column and treatment effects respectively, with >> a; = 
> 8: = Xv = 0, and the ¢,,,’s normally and independently dis- 
tributed about a mean of zero with variance o”. Letting p, & , B; 
and 4, be the usual least squares estimates, and 


Ain = A+A + B; + th Sin = 


define 
Then Tukey’s statistic may be written as 
T? = (D0 — — (1) 


‘Prepared with partial support of the Office of Ordnance Research, U. S. Army under Contract 
DA-04-200-ORD-171, Task Order 3. 
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where the summation extends over all n’ cells of the array. 7° then 
separates one degree of freedom from the original error sum of squares; 
a test of the hypothesis of additivity may be made by comparing T? 
with the residual mean square calculated after its removal. 

3. CALCULATION OF TUKEY’S STATISTIC 


To calculate Tukey’s statistic it is sufficient to consider s,;;, — fi. . 
Performing the algebraic computations, it is seen that 


— fin) = + + BM 
bat Diana + 4.6, | (2) 


where 


Bit 


denotes the sum over all cells in row 7 of the product of the column j 
estimate with the treatment estimate for cell 7j7._ or example, if the 
following 3 X 3 array of estimates arose, > >‘~? 8,4, would denote 


Bits + Bot: + 


Columns 
a 
Ys a3 
Bs Bs 


Similar definitions hold for the other two sums. 
All the ensuing numerical calculations are based on formula (2). 


4. NUMERICAL CALCULATIONS: EXAMPLE 


The following example (n = 5) was given in Biometrics [2], by 
J. Tukey, concerning the responses of five animals, each subjected to 
five conditions, during five periods of one week each. The corresponding 
treatment numbers have been inserted in the tables. It is to be noted 
that in order to avoid fractions, the effect totals rather than the means 
are considered in Table 2. The calculations may be arranged in a series 
of tables as follows: 


lj 
\ 
° 
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25%. 2573 2571 2575 254, = 
= —1824 = 871 = 346 = —1729 = 2336 
871 — 1824 — 1729 2336 346 25a, = —2919 
346 — 1729 2336 — 1824 871 254; = —79 
2336 346 — 1824 871 — 1729 25& = 4616 
—1729 2336 871 346 — 1824 254, = —1734 
2521 2582 25Bs 252, 252, 
— 263-1 — 669 —334 391 3246 


TABLE 1 
OpseRVED VALUES WITH CORRESPONDING TREATMENT NUMBERS 
Columns 
2 4 3 1 5 


Rows 3 1 5 2 4 


TABLE 2 


VALUES OF , CORRESPONDING TO TREATMENT 
Numprrs, Rows anp CoLuMNS 


| 
Ae 
‘| 
| 194 | 369 | 344 | 380 | 693 
i 4 2 1 5 3 
- 202 142 | 200 | 356 | 473 
a 335 301 439 338 528 
5 
S| 5 3 2 4 1 
Ad. oh 515 | 590 | 552 | 677 | 546 
5 4 3 2 
ay 184 | 421 | 355 | 284 | 366 
4 
14 
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TABLE 3 
VALUEs oF + + AND SuMs 
Here c = 107%, and entries are rounded off to nearest integer 
2 4 3 1 5 >' 
4287 | —559 | —114 | —831 8230 | 11,015 | Dot = 20,624 
4 2 1 5 3 
2852 8497 6599 | —7047 | —9362 1,539 > = —5,554 
3 1 5 2 4 
—731 1346 | —938 — 600 2502 1,579 | = — 13,972 
5 3 2 4 | 
—7529 | —1722 | —9352 | 6166 1390 | —11,047 | So&mt = 9,726 
1 5 4 | 3 2 
12,120 —4453 | —1222 | —1143 | —8386 —3,084 > = —131,737 
>: 10,999 | 3109 | —5027 | —3455 | —5626 0 0 
TABLE 4 
+ + anv Sums 
2 4 3 1 5 
3292 ; 4772 | —1417 5636 | —1270 11013 Treatments 
4 2 1 5 3 k Sums, >>* 
4455 —181 3427 | —2731 | —3432 1538 ———_— 
1 20,623 
3 1 5 2 4 2 — 5,554 
—99 5062 | —3037 | —1486 1138 1578 3 — 13,072 
4 9,738 
5 2 4 1 5 — 11,737 
—2357 —4202 | —4326 | —953 790 | —11048 
1 5 b 3 2 
5788 | —2342 326 | —3922 | —2853 —3083 
10999 ; 3109 | —5027 | —3456 | —5627 —2 
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TABLE 5 
— TABLE 3 Minus TaBLeE 4 
3 4 3 1 5 Sums over 
995 | —5331 1303 | —6467 9500 0 Treatments 
4 2 1 5 3 k sums over k 
— 1603 8678 3172 | —4316 | —5930 1 — 
- 1 1 
3 1 5 > 4 2 0 
—632 | —3716 2099 886 1364 1 3 0 
4 1 
5 3 2 4 1 5 0 
—5172 2480 | —5026 7119 600 1 ~ 
2 
1 5 3 2 
6412 | —2111 | —1548 2779 | —5533 -1 
0 0 0 1 1 2 


It is convenient to introduce the abbreviation for , and 
similarly >>’, 

T’ is then computed from (1). For the numerator it is sufficient 
to multiply each of the entries in Table 1 by the corresponding entries 
in Table 5, add, and square the result. The denominator is equal 
to the sum of squares of the entries in Table 5. In this case 

(290 658)” 


= 521,675 x 10° ~ 


The entire analysis of variance table, calculated with the aid of 
Table 2 appears as follows: 


Iffect Degrees of freedom Sum of squares F ratio 
rows 4 262,836 16.3 
columns 4 145,492 9.05 
treatments 4 101,213 6.29 
non-additivity I 162 0.04 
error 1) 44,229 
Total 24 553,932 
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In this example, since the F-ratio for no. additivity is very small, 
the hypothesis of additivity is not rejected. 

The effect of rounding in this case is small, with an error of about 
one, about the same as in [2]. In Table 3, each entry may be multi- 
plied by a convenient constant c without affecting the computed value 
of T”. 

It should be noted that, with no rounding in Table 3, the sum 
of all entries is 0; in Tables 2 and 5 the sum over each row, column, 
and treatment is 0, and the marginal sums in Tables 3 and 4 are the 
same, namely >>‘, >>’, and >>", providing several checks in com- 
putations. 


REFERENCES 
Pukey, J. W. [1949]. One degree of freedom for non-additivity. Biometrics 4, 
232-242. 
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Some of the work inv ved in e saved Db taking advantage { the 
properties of the intrablock subgroup. Yet, when m is large and a 


Lumber 


~ various-order interactions are confounded, the problem ot 


determining the intrablock subgroup becomes a tedious one. ‘The aim 


of this paper is to explain a relatively simple direct method for the same 


A METHOD FOR CONSTRUCTING THE INTRABLOCK SUBGROUP 


Here, we shall specifically follow the treatment in Kempthorne [2}, 
Section 17.1. We will deal with the case when pis prime. Leta,,+-> 
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be the n factors under consideration and X, , --- , X,, , a set of m 
independent interactions confounded with the blocks. X, will be of 
form Ays‘'Ay“* --- A? where the a’s are integers lying between 0 
and p — 1. Denote a treatment combination aj'‘a;** --- az"* by 
(%1¢ , *** » 2a). From Kempthorne [2], p. 318, we note that the treat- 
ment combinations in the intrablock subgroup satisfy the equations 


d = 0 mod P, 1,-+++,m. (1) 
i=1 
Since the a’s are integers between 0 and p — 1, it is clear that if 
(4, , --: , €,) is any root of (1), we can find a vector (zt , --- , x*) 
such that 
= 2, + kop, t= 1, ---,n, (2) 
(where k, is a positive or negative integer or zero) and (x* , --- , 2*) 
satisfies the equation 
Daz; =0, t= 1,+++,m, (3) 
i=l 
(without the modulus) and conversely, if (x , --- , z*) is a vector 
with integral entries, satisfying (3), then the vector (4, , --- , 4,) 


wherein 4; = x* mod 7, satisfies equation (1). Hence, we observe 
that the intersections of the planes in (3) define all the treatments 
in the intrablock subgroup (though not uniquely). 

Let A, denote the m X n matrix [a;;]. Complete A, by the addition 
of a (n — m) X n matrix A, to form a nonsingular matrix A. Consider 
the change of axes from X’ = (x, , +--+ , toZ’ = (z,, , z,) through 
the relation ° 


Z = AX. (4) 


From (4), X = A~’Z. Notice that the planes Z; = 0,7 = 1, --- , m, 
correspond to the planes given by (3). Now, we know that, in terms 
of the coordinates z, , --- , z, , the points P{ = (0, --- ,0,1,0, --+ , 0) 
i = 1, +++,” — m, where 1 is in the (m + 7)th position, lie on all the 
planes Z; = 0,7 = 1, --- , m. Hence we observe that in terms of the 
z’s, the points A~’P,; i = 1, --- , n — m, lie on the planes (3). From 
the definition of P; , this implies that the points (A~’),, , i = m + 
1, --+ , n, where (A~’),, is the ith column of A™, lie on planes (3). 
The above discussion then implies that the treatment combinations— 
except for a multiple of p—are obtained by making such linear com- 
binations of vectors (A~'),,,7 = m+ 1, , n, that will lead to vectors 
with integral co-ordinates. 
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Thus, the intrablock subgroup is obtained by the following procedure: 
(1) Construct A. (2) Obtain A™~’ and reduce the last n — m columns 
of 1~' to vectors corresponding to a set of independent treatment 
combinations by taking their lincar combinations and taking their 
moduli with respect to p. (3) Formulate the generalized treatment 
combinations by taking all possible products of independent treat- 
ment combinations and replacing the powers of the factors by their 
moduli with respect to p. The set of different treatment combinations 
given by this method form the intrablock subgroup. The only step 
which might lead to elaborate work is the inversion of A. If we select 
A, so that the rows correspond to (n — m) independent interactions, 
independent also of X, , then the elements of A become integers lying 
in the range (0, p — 1) and 17’ ean be obtained through direct opera- 
tions on rows (ef. Frazer et al. [1], p. 119). By selecting m of the smallest- 
order interactions from the set of all interactions that are being con- 
founded to correspond to A, and (n — m) of the smallest order inter- 
actions that are not being confounded to correspond to A, , the matrix 
A can be put in a much simplified form. 


EXAMPLE 


Consider a 2” ina, , +--+ ,a,. Suppose that it is decided to confound 


A,AgAs , AgAgAg-Az and 13.44.15 so as to be able to make an experiment. 
in blocks of size 2‘. Then 


1110090 0 0 
A,=|0 001 1 1 I} 
00131100 


Note that the interactions (of order one in this case) A, , Ag, A, and Ag 


are independent of each other and also of confounded interactions. 
Therefore, we take 


000 0 0 
100000 
= 

001:00 

19 00001 0 


By using the method in Frazer et al. [1], p. 119 for the inversion of 
matrices, we obtain 
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0 0 0 0 0 0 

0-0 0 0 0 0 

1 0 0 0 
A = 0 0 0 0 0 1 0}. 
1 1 0 

0 0 0 0 0 0 1 


- Obviously, —1 mod 2 = +1. Hence on the basis of columns 4, 5, 
6, 7, we have @,43@5@7 , @20;0;07 , 2,2, and a,a; as a set of 4 independent 
treatment combinations forming the intrablock subgroup. Complete 
set obtained by using the method explained in (8) is (1), a,a3@sa7 , 


CONCLUSION 


Thus, here we have a direct method in which the task of selecting 
treatment combinations from the set of p” treatment combinations for 
formulating the intrablock subgroup has been reduced to that of in- 
verting a matrix of order n X n, not involving p. It may be used with 
advantage for large p when selecting of the treatment combinations 
by trial and error is very cumbersome. 


REFERENCES 
{1] Frazer, R. A., Duncan, W. J. and Collar, A. R. [1950]. Elementary Matrices. 
Cambridge University Press, Cambridge. 


{2] Kempthorne, O. [1952]. The Design and Analysis of Experiments. John Wiley 
and Sons, New York. 


NOTE ADDED IN PROOF 
As this issue of Biometrics was in page proof, the following reference 
was brought to the author’s attention: 
Bailey, Norman T. J. [1959]. Use of linear algebra in deriving prime power 
factorial designs with confounding and factorial replication. Sankhya 21, 345-51. 


Bailey has also attempted to give a systematic method for obtaining 
the intrablock subgroup. However, he has not given any explicit 
solution for the intrablock subgroup. 
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CORRECTION 


to Note 143 “Ona 5 X 2’ Factorial Design” 


In the above note by B. V. Shah, published in Biometrics, March 
1960, in the set of four formulae for treatment effects given almost 
at the end of the note, small brackets before Q and after J; should 
be inserted to read 


£(:00) = + Ji), ete. 


ACKNOWLEDGMENT 


In “Analysis of Quadruple Rectangular Lattice Designs”, published 
in Biometrics, vol. 15, no. 1, 74-86 (1959), reference should have been 
made to the following papers by Dr. P. M. Roy: 


{. Roy, Purnendu Mohon, “Rectangular Lattices and Orthogonal 
Group Divisible Designs”, Calcutta Statistical Association ; 
letin, Vol. 5, No. 18, March 1954. 


2. Roy, Purnendu Mohon, “Analysis of p X (p — 1), n-ple Latinized 
Rectangular Lattices and their Multiples”, Calcutta Statistical 
Association Bulletin, Vol. 6, No. 28, December 1955. 


3. Roy, Purnendu Mohon, “On the Distribution of Varieties of 
p X (p — 1), n-ple Latinized Nectanguiar Lattices and Weighted 
Average Variances’, Calculte: Statistical Association Bulletin, Vol. 
7, No. 27, July 1957. 


The work by Dr. Roy covers a general class «i lattices, including the 


quadruple rectangular lattice. I am indebted to Dr. Roy for noting 
an error on page 79. The expression in the last line should read sa 


AW + KW + (k — 
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BOOK REVIEWS 


J. G. Skeviam, Editor 
Members are Invited to Suggest Books for Listing or Review to the Editor 


COX, D. R. Planning of Experiments. New York: John Wiley & Sons Inc. 
4 London: Chapman & Hall. 1958. pp. vii + 308. Tables, Figures and Diagrams. 
60s. 


Hf. D. Parrerson, Rothamsted Experimental Station, Harpenden, England. 


Dr. Cox’s aim in this book has been to provide an account of the statistical 
aspects of experimental design which would be intuitively acceptable to experimental 
workers. In this he has succeded admirably and many experimentalists who have 
seen the book have praised it highly, particularly for the excellent way in which it - 
explains the logic of procedures which they have long followed in the field. Several 
workers have specially commented on the excellent account of randomization (in 
particular the rejection of unsatisfactory randomizations) and on the distinctions 
drawn between the different types of factors which may arise in factorial experi- 
ments; the essentially practical outlook of the discussion of these topics is typical 
of the whole book. 

The book should be useful also to professional statisticians as an adjunct to 
formal training in specialised techniques. It will help them to play a more useful 
part in those wider problems of experimental design whose importance has been 
stressed by F. Yates (J. Ind. Soc. Agric. Stat., 5, 109-18) and D. J. Finney (J. R. 
Statist. Soc. Series A, 1-27). 

The exclusion of mathematical details and the abundance of real-life examples 
make the book extremely readable and no doubt more attractive to biological 
research workers. The examples are drawn from many ficlds of work. Those con- 
cerned with agricultural experimentation, with which the reviewer is most familiar, 
are for the most part excellent, though the choice of one of the examples on the use 
of split plot designs is perhaps a little unfortunate. If grazing treatments are accom- 
modated on whole plots and fertilizer treatments on subplots in a grassland experi- 
ment, the experimenter will in effect have lost control over the levels of the grazing 
factor to an extent depending on the selectivity of the animals. This 1s indeed a 
small point of criticism in considering the book as a whole, but it emphasises the 
point that designs must be chosen to suit the treatments. 

Adequate references are given for those interested in pursuing individual topics 
or requiring full details of the examples. For some reason, however, the original 
papers on Latin Square and incomplete block designs appear to have been over- 
looked and there is no reference to the tables of these designs given by R. A. Fisher 
and F. Yates in Statistical Tables for Biological, Agricultural and Medical Research. 

The book has already been well received in several journals. It only remains 
for the present reviewer to add his own unreserved recommendation. 
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HELVEY, T. C. Effects of Nuclear Radiation on Men and Materials. New York: 
5 John F. Rider Publisher Inc. London: Chapman & Hall, Ltd. 1959. pp. v + 56, 
Tables and Diagrams. 15s. 


A few basic concepts on nuclear fission are presented in a simple diagrammatic 
manner, and the effects of radiation on man and materials are briefly described. 

The last chapter deals in general terms with an interesting problem on shield 
configuration, namely, the achievement of adequate protection for the crew of a 
nuclear-powered aircraft without creating aeronautical difficulties by imposing an 
intolerable distribution of heavy shielding material. A partial solution is advanced 
in terms of two shield barriers, one around the reactor and the other around the 
crew compartment. 


J.G.S. 


6 HIGLEY, H. G. et al. The Intervertebral Disc Syndrome. Iowa: The National 
Chiropractic Association, Webster City. 1960. pp. 120. 30 Tables. 3s. 


In this monograph, which includes an excellent bibliography of 897 references, 
all available information has been compiled for use as a basis for the design of a 
long term study of disc pathology and the values of various therapies, Though 
primarily concerned with the dissection of the problem into its many medical facets, 
the report deals with the need for definition both in diagnosis and in selecting the 
dependent variables, considers criteria for the omission of statistical data, and 
includes numerous tables classifying the available statistics in various ways. Apart 
from the use of linear regression, the main statistical method employed is that of 
A. W. Kimball 1954 (Biometrics 10: 452-458) involving the partition of x? in a 
contingency table. 
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ABSTRACTS 


Papers presented at ihe joint meeting of the Biometric 
Society ENAR and the American Institute of Biological 
Sciences in Stillwater, Oklahoma on August $1, 1960. 


682 REX L. HURST (Statistical Laboratory, Ltah State University). The De- 
sign, Analysis and Interpretation of Greenhouse and Laboratory Experiments. 


Statistics enters, primarily, into three phases of a research project. These are: 
clarification and logic of objectives, control and identification of sources of variation, 
and the analysis and interpretation of results. 

The experimental design includes selection of treatments to be studied and how 
external variation is to be controlled. Esch treatment included in the study should 
have a clear cut position with respect to achieving the objectives. The control of 
variation may be attempted by design or by measurement of pertinent auxillary 
variates. Unmeasurable sources of variation can often be reduced by various types 
of blocking. When measurements can be made on auxillary variates mathematical 
methods of reducing variation can be used. These methods in addition to reducing 
variation often provide basic information pertaining to the research 

The presentation and intrepretation of results is of great importance. There 
is a trend for too many research workers to evaluate their results from a purely 
statistical point of view. Some workers even fail to distinguish the difference 
between statistical significance and partical or technical importance. The current 
emphasis on response surfaces provides one effective means of both mathematical 
and graphical presentation. 
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THE BIOMETRIC SOCIETY 


International 
The membership of the Society at the end of 1959 was as follows:— 
Australasian 73 India 29 
Belgian and Belgian Congo 85 Italian 90 
Brazilian 56 Japan 50 
British 207 Netherlands 42 
Denmark 14 Sweden 13 
E.N.A.R. 794 Switzerland 32 
French 77 W.N.A.R. 151 
German 113 At large 77 


The total of 1,903 represents an increase of 451 over the previous year. 

The members of Council for the term 1960-62 are Dr. C. I. Bliss (ENAR), 
Dr. D. J. Finney (BR), Prof. A. Linder (Switzerland), Dr. P. V. Sukhatme (RItl), 
Prof. G. Teissier (RF) and Dr. F. Yates (BR). 


GENERAL TREASURER 


Dr. A. W. Kimball has resigned the post of General Treasurer on taking up a 
new post. Appointed in 1957 in succession to Dr. C. I. Bliss, Dr. Kimball has 
overseen the growth of the Society to a membership of nearly 2,000 and has main- 
tained it in a very strong financial position in spite of the low subscription rate. 
Dr. Kimball’s place is being taken by 


Dr. M. A. Kastenbaum, 
The Biometric Society, 
P. O. Box 2017, 

Oak Ridge, Tennessee, 


Région Francaise 


M. Ph. L’Heritier has been elected Regional President. 

At a meeting held on May 24th, 1960, the following'paper was presented :— 

J. Lellouch and D. Schwartz—Comparaison de 2 groupes pour une variable en 
corrigeant de |’influence d’une autre variable. 


Brazilian Region 


The Brazilian Region of the Biometric Society held its 6th meeting at the 
Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba, Sio Paulo, Brazil, 
April 12, 1960. Scientific session was held at 9 a.m. Papers given include Prof. 
J. T. A. Gurgel and his Assistants “Experiments on time of planting and densities 
for onions”; Prof. F. G. Brieger’s “Grouping and contrasts in the statistical analysis” ; 
F. Pimentel Gomes “Modern methods for the comparison of means”; J. M. Pompeu 
Memoria presented “Considerations on the teaching of statistics in the Universities”’. 

The annual business meeting was held at 2 p.m. Officers elected for 1960-61 
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term. were: President—A. M. Penha, Secretary—A. Conagin, ‘Treasurer—A. Grosz- 
mann. Council Members: F. Pimentel Gomes, R. A. Silva Leme, F. G. Brieger, G. 


Garcia Duarte, C. G. Fraga Jor e J. M. Pompeu Memoria. 


Deutsche Region 
The officers for 1960 are:— 


President—O. Heinisch 


Secretary—R. Wette 
Treasurer—M. P. Geppert 


Japan 
The following papers have been presented at recent meetings:— 

K. Sato—The relatidnship between three stimulus factors and the “transforming 
action” of the brain. 

K. Takahashiand T. Kashiwagi—Component analysisof autonomic nervous function. 

8S. Tanaka—Mathematical models for the relation between abundance of the 
spawning population and recruitment of fish. 

M. Masuyama—Response Pattern Analysis. 

M. Masuyama—Response Pattern Analysis of Blood Pressure when Noradrenaline 
is administered twice. 

M. Masuyama—Dose—mortality curves. 

Y. Oshima, K. Takahashi et al—Factor analysis of autonomic regulation. 

K. Sakai, 8. Iyama and T. Narise—Cytoplasmic inheritance in autogamous plants. 

K. Kitarsura—A transformation of experiments in randomised blocks. 


JOINT MEETING OF THE BIOMETRIC SOCIETY (ENAR) AND THE 
AMERICAN INSTITUTE OF BIOLOGICAL SCIENCES, 
STILLWATER, OKLAHOMA AUGUST 31, 1960 


Program 


MODERN STATISTICAL AND COMPUTING TECHNIQUES IN 
AGRICULTURAL SCIENCE 


Chairman: Erwin LL, LeClerg—C. E. Marshall: Sampling techniques in horti- 
cultural research. t. Hurst: Design, conduct, and interpretation of green!ouse and 
laboratory experiments. /?. D. Morrison: Use of high speed computers in a;zricultural 
research 


CHANGES IN MEMBERSHIP 
(July 1-October 1, 1960) 


Changes of Address 

Mr. a R. Allmaras, U.S.D.A. Field Research Station, Morris, Minnesota, 
U.LIS. A. 

Mr. Laurence If. Baker, 4013 33rd Street, Des Moines, Iowa, U.S. A. 

Mr. Walter A. Becker, Department of Poultry Science, Washington State University, 
Pullman, Washington, U. 8. A. 

Dr. Archie Blake, 35 Hiram Road, Framingham, Massachusetts, U.S. A. 

Dr. Nils Blomqvist, Dr. Forselius gata 22, IGoteborgISV, Sweden. 
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Mr. Roger L. Bollenbacher, 2215 W. Lexington Avenue, Elkhart, Indiana, U. S. A. 

Dr. Samuel H. Brooks, General Analysis Corporation, 11753 Wilshire Blvd., Los 
Angeles 25, California, U.S. A. 

Dr. Richard Leston Carter, Department of Management Engineering, Rensselaer 
Polytechnic Institute, Troy, New York, U.S. A. 

Dr. James Lee Cason, Department of Dairy Husbandry, University of Maryland, 
College Park, Maryland, U.S. A. 

Dr. J. B. Chassan, Department of Health, Education and Welfare, Office of Educa- 
tion, Washington 25, D. C., U.S. A. 

Mrs. Virginia Clark, 29 Concord Avenue, Apt. 812, Cambridge 38, Massachusetts, 
U.S. A. 

Dr. Richard G. Cornell, Department of Statistics, Florida State University, Talla- 
hassee, Florida, U.S. A. 

Mr. Arthur S. Covert, 33 Front Street, Schenectady, New York, U.S. A. 

Mr. Mare Dalebroux, 24 Rue du Moulin, Chatelineau, Belgium. 

Dr. M. Bryan Danford, SAM-2832, Brooks Air Force Base, Texas, U.S. A. 

Mr. Ira A. DeArmon, Jr., 219 Broadway, Bel Air, Maryland, U.S. A. 

Miss Martha W. Dicks, Department of Home Economics Research, Montana 
State College, Bozeman, Montana, U. S. A. 

Mr. William F. Dossett, 1220 Shoreline Drive, Santa Barbara, California, U, S. A. 

Mr. A. T. Dunn, The Central Statistical Office, Cabinet Office, St. George Street, 
London 8.W. 1, England. 

Mr. R. C. Elston, Department of Biostatistics, University of North Carolina, Chapel 
Hill, North Carolina, U.S. A. 

Dr. Marc F. Fontaine, Trinidad House, 29/30 Old Burlington Street, London S. 1, 
England. 

Dr. Friedrich Franz, Fliederstrasse 19, c/o Geier, Aalen/Wurtt, Germany. | 

Dr. Howard T. Fredeen, Canada Department of Agriculture, Lacombe, Alberta, 
Canada. 

Dr. J. Gani, Department of Mathematics, University of Western Australia, Nedlands, 
Western Australia. 

Mr. Granville R. Gargiulo, 37-22 80th Street, Jackson Heights 72, New York, U.S. A. 

Mr. Lincoln J. Gerende, 55 Vernon Street, Hamden 18, Connecticut, U.S. A. 

Dr. Jerome M. Glassman, Wyeth Inst. for Medical Research, P. O. Box 8299, 
Philadelphia 1, Pennsylvania, U.S. A. 

Dr. Franklin Graybill, Mathematics Department, Colorado State University, 
Fort Collins, Colorado, U. 8. A. 

Mr. W. B. Hall, C.S.1.R.0., 343 Royal Parade, Parkville, N 2, Victoria, Australia. 

Mr. Michael J. R. Healy, Rothamsted Experiment Station, Harpenden, Herts., 
England. 

Dr. Johannes Ipsen, Jr., Phipps Institute, 4219 Chester Avenue, Philadelphia, 
Pennsylvania, U.S. A. 

Mr. Iver A. Iverson, 3828 Edgewood No., Minneapolis 27, Minnesota, U. S. A. 

Mr. George M. Jolly, Isle of Palbay, Broadford, Isle of Skye, Inverness-shire, 
Scotland. 

Dr. med. Herbert Jordan, Karl-Marx-Strasse 4, Bad Elster, Germany. 

Mr. Cecil L. Kaller, Department of Mathematics, University of Saskatchewan, 
Saskatoon, Saskatchewan, Canada. 


Dr. Leo Katz, Department of Statistics, Michigan State University, East Lansing, . 


Michigan, U. S. A. 
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Mr. George H. Kennedy, Research Grants Office, National Cancer Institute, 
Bethesda 14, Maryland, U.S. A. 

Mr. Susumu Kikuchi, Faculty of Technology, Okayama University, Tsushima, 
Okayama-shi, Okayama-ken, Japan. 

Dr. A. W. Kimball, Jr., Department of Biostatistics, Johns Hopkins University, 
Baltimore, Maryland, U.S. A. 

Dr. Steven C. King, Poultry Research Branch, AHRD, ARS, USDA, Beltsville, 
Maryland, U.S. A. 

Dr. John R. Kinzer, 540 Harley Drive, Apt. 3, Columbus 2, Ohio, U. S. A. 

Dr. Richard Lang, Statistical Laboratory, University of Florida, Gainesville, 
Floride, U.S. A. 

Dr. Jean Lebrun, 12 av. des Lucioles, Watermael, Belgium. 

Dr. Robert memeaaese M.P.I. f. Pflanzengenetic, Rosenhof Post Ladenburg/Neckar, 
Germany. 

Mr. Alfred Lieberman, 6125 Durbin Road, Bethesda, Maryland, U.S. A. 

Marcel J. W. Luttgens, 30 rue de Haache, Wakkerzeel, Belgium. 

Mr. Judson U. McGuire, Jr., European Parasite Laboratory, 20 bis rue Sadi Carnot, 
Nanterre (Seine) France. 

Mr. G. McLoughlin, 292 Harvard Street, Cambridge, Massachusetts, U.S. A. 

Mr. Shigeichi Moriguchi, Department of Mathematical Statistics, Columbia 
University, New York 27, N. Y., U.S. A. 

Prof. Lincoln E. Moses, 64 Sandfield Road, Headington, Oxford, England. 

Mr. Jack Nadler, Bell Telephone Laboratories, Inc., Whippany, New Jersey, U.S. A. 

Dr. Anita Rapoport, 524 Panama Avenue, Long Beach 14, California, U.S. A. 

Dr. Gisela Reissig, Bundenbacher Weg 5/1, Berlin-Weissensee, Germany. 

Professor W. E. Ricker, Fisheries Research Board of Canada, Nanaimo, British 
Columbia, Canada. 

Mr. Howard R. Roberts, 4921 Auburn Avenue, Bethesda 14, Maryland, U.S. A. 

Mr. Julien F. Ronchaine, 32 rue Saint Victor, Huy, Belgium. 

Mr. J. Rossignui, Baronville, Beauraing, Belgium. 

Dr. Vincent Schultz, Division of Biology and Medicine, U. S. Atomic Energy Com- 
mission, Washington 25, D. C., U.S. A. 

Mr. Thomas EF. Sedlmayr, 704 Cherry Lane, East Lansing, Michigan, U. 8S. A. 

Dr. Maynard W. Shelly II, 972 N. Quantico Street, Arlington 5, Virginia, U. S. A. 

Dr. Mindel C. Sheps, 305 University Square Apts., 4625 Fifth Avenue, Pittsburgh 
13, Pennsylvania, U.S. A. 

Dr. Donald V. Sisson, Department of Zoology and Entomology, Iowa State Univer- 
sity, Ames, Iowa, U. S. A. 

Dr. John H. Smith, Graduate School of Business, University of Chicago, Chicago 
37, Illinois, U.S. A. 

Dr. Robert G. D. Steel, Department of Experimental Statistics, North Carolina 
State College, Raleigh, North Carolina, U.S. A. 

Mr. Farl A. Thomas, 314 Hector Road, McLean, Virginia, U.S. A. 

Mr. Peter F. Wade, Price Waterhouse and Company, 606 Cathcart Street, Montreal 
2, Quebec, Canada. 

Mr. Thierry Waffelaert, 216, Lg. Lumstraat, Anvers, Belgium. 

Dr. Rolf Wartmann, Grunerstr. 63, Schwerte/Ruhr, Germany. 

Mr. G. A. Watterson, Department of Statistics, Virginia Polytechnic Institute, 
Blacksburg, Virginia, U.S. A. 

Miss Roberta A. Wilcox, 150 N. Middleton Road, Pearl River, New York, U.S. A. 
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Mr. G. N. Wilkinson, Department of Mathematical Statistics, C.8.I.R.O., Uni- 
versity of Adelaide, Adelaide, South Australia. , 

Dr. Wm. H. Williams, Bell Telephone Laboratories, Murray Hill, New Jersey, 
U.S. A. 


New Members | 

At Large 

Mr. H. M. Dicks, Department of Agriculture, J S. Marias Building, Stellenbosch, 
South Africa. 

Dr. Enrique Carlos Roncoroni, Dorrego 757, Rosario, Argentina. 

Dr. Andre A. O. Varma, Department of Biostatistics, Bureau of Public Health, 
Paramaribo, Surinam. 


Belgian Region 

Mr. Pan-Leang-Cheav, Institut Agronomique, Gembloux, Belgium. 

Mr. J. Fraselle, 108 rue des Haeyettes, Salzinne, Namur, Belgium. 

Mr. M. Guisset, 13 rue Archimede, Brussels 4, Belgium. 

Mr. Gerard Torreele, D. P. V. Yangambi KM 17, Belgian Congo. 

Mrs. T, Van den Driessche, 21 Square du Castel fleuri, Brussels 17, Belgium. 


German Region 

Dr. F. Nappen, Katzenburgweg 7-9, Bonn, Germany. 

Dr. F. X. Wohizogen, Universitat Wien, 9, Schwarzspanierstrasse 17, Austria. 

Prof. Dr. Alfred Zeller, Landwirtschaftlich-chemische, Bundes-Versuchsanstalt in 
Wien, II, Trunnerstrasse 1, Austria. 


Japan 

Mr. Tatsuo Ishihara, Kanagawa Prefectural Agr. Expt. Station, 496 Terada-nawa, 

Hiratsuka-shi, Kanagawa-ken, Japan. 

Mr. Shigeru Suzuki, National Institute of Agricultural Science, Chiba-shi, Chiba- 
ken, Japan. 


Norway 

Prof. H. K. Seip, Agricultural College of Norway, Vollebekk, Norway. 

Switzerland 

Dr. Med. Eugen Olbrich, Innsbruck, Mullerstrasse 59, Osterreich, Tiral, Switzerland. 

ENAR 

Prof. Foster B. Cady, Jr., Statistical Laboratory, Iowa State University, Ames, 
Towa, U.S. A. 

Mr. Harry Hai-Hyung Cho, 130 Newbury Street, Bosion 16, Massachusetts, U.S. A. 

Mr. Eugene Cohen, Statistical Laboratory, University of lowa, Ames, Iowa, U.S. A. 

Miss Fredrica Greul, 704 Steamboat Road, Greenwich, Connecticut, U. 8. A. 

Mr. John R. Howell, Box 2377, University Station, Gainesville, Florida, U.S. A. 

Mr. Stanwyn G. Shetler, Department of Botany, University of Michigan, Ann 
Arbor, Michigan, U.S. A. 

Miss Janace Speckman, Box 416, Florida State University, Tallahassee, Florida, 
U.S. A. 

WNAR 

Mr. Peter A. Dawson, Department of Genetics, University of California, Berkeley 4, 
California, U.S. A. 

Mr. Albert R. Stage, 157 South Howard Street, Spokane 4, Washington, U.S. A. 
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MEETINGS OF E.N.A.R. 

The Eastern North American Region will meet jointly with the Institute of 
Mathematical Statistics in April 1961 at Cornell University. Titles and abstracts, 
the latter in duplicate in the form published in Biometrics, of contributed papers 
for E.N.A.R. should be sent to Dr. Erwin L. LeClerg, Biometrical Services, Plant 
Industry Station, Beltsville, Maryland. 

In 1961 E.N.A.R. will also meet jointly with the American Institute for Bio- 
logical Sciences at Purdue University, and with the American Statistical Association 
in New York City. 


THIRTEENTH ANNUAL MEETING OF THE BIOMETRIC SOCIETY 
(ENAR) WITH THE BIOMETRIC SOCIETY (WNAR), STANFORD, 
CALIFORNIA AUGUST 23-26, 1960 


Program 
MULTIVARIATE ANALYSIS 
Chairman: W. T. Federer—R. L. Anderson: Recent developments in multi- 
variate analysis. S. N. Roy: The future of multivariate analysis. R. E. Bargmann: 
On the problem of ordering variables in tracing significant contributions. 


DESIGN OF EXPERIMENTS—I 
Chairman: T. A. Bancroft—R. Gnanadesikan and M. B. Wilk: Some remarks ~ 
on plotting procedures in the analysis of experiments. B. V. Shah: Mixed factorials 
in incomplete blocks. B. Kurkjian and M. Zelen: A general theory for assymetrical 
confounded factorial experiments. D. S. Robson: Cumulant component analysis. 


REGRESSION 
Chairman: David S. Stoller—Mazx Halperin: Simple nearly least squares esti- 
mates in heteroscedastic regression. 


PROBLEMS IN MEDICAL STATISTICS 

Chairman: Walter E. Hoadley, Jr.—Bradley E. Copeland: Problems in pathology 
with statistical aspects. The Hon. Dale Alford: Statistical aspects of medical prob- 
lems of national interest. Discussion: Carl E. Hopkins and R. G. Hoffman. 


DESIGN OF EXPERIMENTS—II 

Chairman: R. L. Anderson—W. T. Federer: Augmented designs with two and 
three way elimination of heterogeneity. G. A. Baker and Burt Hoyle: Game theory 
applied to field trials. A. #. Brandt: Factorial chi-square. 


STUDY OF GROWTH 

Chairman: Bhim S. Savara—B. S. Kraus: Current research and statistical 
needs in growth studies. R. F. Tate: A measure of support for a stochastic order. 
Harley B. Messinger: A geometrical motel for height-weight control charts in 
schoolchildren. 


QUANTIFICATION AND MEASUREMENT 

Chairman: Clifford J. Maloney—P. Suppes: Foundations of measurement. 
Robert M. Thrall: A review of mathematical aspects of the theory of measurement. 
Robert S. Ledley: A sequential decision theory ayplied to medical diagnosis. 
Discussion: S. S. Stevens. 
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BIOMETRICAL GENETICS | 

Chairman: E. R. Dempster—O. Kempthorne and R. N. Curnow: The partial 
diallel cross. C. Kaller: Genetic-environmental interaction model. C. J. Mode: 
On the theory of improvement of metric traits in outbreeding populations. 


INVITED PAPERS—HONORING HAROLD HOTELLING AT HIS 65TH 

BIRTHDAY—I 

Chairman: Dorothy M. Gilford—S. N. Roy: General Introduction. R. 
Gnanadesikan and M. B. Wilk: One degree of freedom plots in multiresponse factorial 
experiments. R. Bargmann: Continuous responses, not necessarily normal, with 
applications. R. D. Bock: Applications in psychometry. 
CONTRIBUTED PAPERS—I 

Chairman: William E. Reynolds—H. M. C. Luykz and Betty L. Murray: Dura- 
tion of illness, R. M. Thorner and Q. R. Remein: Some aspects of screening testa 
for the detection of disease suspects. C. L. Kaller and V. L. Anderson: An environ- 
ment extension of chromosome analysis in population genetics. H. E. McKean and 


B. B. Bohren: Numerical aspects of the regression on parent of offspring. Clifford 
J. Maloney: Disease severity quantification. 


CONTRIBUTED PAPERS—II 

Chairman: A. E. Brandt—William Taylor and Joseph Berkson: A problem of 
testing and estimation involving four-fold tables. William R. Gaffey: Tests of 
hypotheses concerning boundedness in convolutions. K. Abt: Analysis of variance 
of difference versus analysis of covariance. K. H. Lu: The means and variances of 
the products of two or three normal variables. J. Keilin: The use of the information 
statistic as a measure of couformity in comparing two sets of responses. Robert 
Elston: On additivity in the analysis of variance. 


STOCHASTIC PROCESSES IN BIOLOGY AND MEDICINE 

Chairman: Albert T. Bharucha-Reid—John G. Kemeny: General remarks on 
Markov chains with illustrative examples in biology. ZL. Martin: Stochastic processes 
in biology and medicine. M.A. Kastenbaum: Countercurrent Dialysis—a stochastic 
process. A.C. Johnson: A stochastic model of incubation-period distributions. 


STATISTICS OF NATURAL POPULATIONS 

Chairman: Douglas G. Chapman—Richard C. Hennemuth: Estimating vital 
statistics of yellowfin tuna populations. A. L. Finkner and Scott Overton: Sampling 
of sportsmen for fish and game population statistics. E.C. Bryant and Donald W. 
King: Estimation from populations identified by overlapping sampling frames. 
D. W. Hayne: Some problems in terrestrial ecology. Douglas G. Chapman: Some 
contributions to population dynamics of the Alaska fur seal. 


INVITED PAPERS—HONORING HAROLD HOTELLING AT HIS 65TH 
BIRTHDAY—II 


Chairman: H. Solomon—S. N. Roy: General introduction. E. Diamond: 
Asymptotic power of the ‘ests. 
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NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secretary (if members 
at large, to the General Secretary) news of appointments, distinctions, or retire- 
ments, and announcements of professional interest. 


AMERICAN INSTITUTE OF BIOLOGICAL SCIENCES 
TRANSLATION PROGRAM 


The American Institute of Biological Sciences is currently translating and 
publishing seven Russian research journals in biology. These journals are translated 
with support from the National Science Foundation, which is eager that such informa- 
tion be more widely distributed to biologists throughout the world. It is hoped that 
this material will aid biologists in research, prevent duplication of work, give some 
idea of the work being done by Soviet scientists in the field of biology, and also bring 
about a better international understanding among scientists. 

Because of the support of the National Science Foundation, the AIBS can 
offer these translations at a fraction of their publication cost, with even further 
price reduction to AIBS members and to academic and non-profit libraries. This 
reduction, the AIBS feels, places the translation within the reach of all biologists. 

The journals currently being translated are: Doklady: Biological Sciences 
Section; Doklady: Botanical Sciences ‘Section; Doklady: Biochemistry Section; 
Plant Physiology; Microbiology; Soviet Soil Science; and Entomological Review. 

In addition to its program of Russian Biological Journal translations, the AIBS 
has instituted a separate program of translation and publication of selected Russian 
Monographs in biology. 

It was felt that the program of Journal translations was not sufficient to cover 
all of the significant work being done in all fields of biology by Russian scientists. 
With the aid of competent authorities, the AIBS has translated and published six 
Russian monographs and one monograph is in the process of being published. In 


addition, several prominent monographs in various biological areas are being con- . 


sidered by the AIBS and the National Science Foundation for translation and 
publication. The monographs that have been published are: Origins of Angio- 
spermous Plants by A. L. Takhtajan; Problems in the Classification of Antagonists 
of Actinomycetes by G. F. Gauze; Marine Biology, Trudi Institute of Oceanology, 
Vol. XX, edited by B. N. Nikitin; Arachnoidea by A. A. Zakhvatkin; and Arachnida 
by B. I. Pomerantzev. The manuscript for Plants and X rays by L. P. Breslavets 
is in the final stages of preparation and should be published early in 1960. 

Additional information pertaining to this program may be obtained by writing 
to the American Institute of Biological Sciences, 2000 P Street, N. W., Washington 
6, A. 


NEWS ABOUT MEMBERS 


E.N.A.R. 


Raymond R. Allmaras recently accepted a position with the Agricultural 
Research Service, U.S.D.A., Soil and Water Conservation Research Division, 
Morris, Minnesota. 
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Roger L. Bollenbacher is presently Statistician for the Bendix Corporation, 
Elkhart, Indiana. He was formerly a student at Purdue University. 

Robert J. Buchler, Statistical Laboratory, Iowa State University, has been 
appointed as Visiting Lecturer in the Department of Statistics at the University 
of Minnesota for part of the summer term. 

Robert Buchler, Herbert David and Leroy Wolins of the Department of 
Statistics, lowa State University, have been promoted from the rank of Assistant 
Professor to Associate Professor. : 

Foster B. Cady of North Carolina State College has been appointed as Assistant 
Professor in Statistics at Iowa State University beginning July 1, 1960. 

Richard L. Carter has been appointed Professor of Management Engineering 
at Rensselaer Polytechnic Institute. Dr. Carter was formerly Associate Professor 
of Industrial Engineering at the Illinois Institute of Technology. 

Jack Chassan, formerly Chief of the Biometrics Branch at Saint Elizabeths 
Hospital in Washington, D. C., has taken a position as Mathematical Statistician 
with the Educational Statistics Branch, Office of Education, Department of Health, 
Education and Welfare, Washington, D. C. 

Richard G. Cornell is now an Associate Professor in the Department of Statistics 
of the Florida State University, Tallahassee, Florida. He was formerly Chief of 
the Laboratory and Field Station Statistics Unit, Communicable Disease Center, 
Atlanta, Georgia. 

Ira A. DeArmon, Jr. has taken a position with the U. S. Army Chemical Corps, 
Operations Research Group. He was formerly Analytical Statistician with the 
U. 8. Army Chemical Corps Biological Laboratories, Fort Detrick, Maryland. 

Lila Elveback,. formerly Professor of Biostatistics at Tulane University, is now 
Head, Statistics Unit, Division of Epidemiology of the Public Health Research 
Institute of the City of New York, Inc., New York, N. Y. 

John Gurland of the Department of Statistics, Iowa State University has been 
appointed as Visiting Professor at the U. S. Army Mathematics Research Center, 
University of Wisconsin for the period July 1, 1960-June 30, 1961. 

H. O. Hartley, Department of Statistics, Iowa State University is spending 
the summer in England where he and E. S. Pearson are at work on the second volume 
of Biometrika Tables. 

Johannes Ipsen, Jr., is Professor of Epidemiology and Medical Statistics at 
the Henry Phipps Institute, Philadelphia, Pennsylvania. He was formerly Super- 
intendent, Massachusetts State Institute of Laboratories and Associate Professor 
Public Health, Harvard University. 

Cecil L. Kaller received the Ph.D. in mathematical statistics from Purdue 
University in June, 1960. He has accepted the position of Assistant Professor of 
Mathematics at the University of Saskatchewan. 

Oscar Kempthorne, Department of Statistics, lowa State University, has been 
granted the research degree, Sc.D., Doctor of Science by Cambridge University 
England on the recommendation of the faculty of mathematics and Board of Research 
Studies. 

Post Doctoral Fellows presently studying at the Statistical Laboratory, Iowa 
State University include: Samuel J. Kilpatrick of Ireland, Wiktor Oktaba of Poland, 
and Joseph Moder of Georgia School of Technology. 

Steven C. King is Chief of the Poultry Research Branch, U.S.D.A., Beltsville, 
Maryland. He was formerly Geneticist at the Mt. Hope Poultry Farm, Inc., at 
Batavia, N. Y. 
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Marcus Kjelsberg has resigned as Instructor in Biostatistics at the Tulane 
University Medical School, New Orleans, to continue his graduate work in the 
Biostatistics Division, University of Minnesota. 

Samuel B. Lyerly has accepted the position of Director of Research, Human 
Ecology Society, Silver Spring, Maryland. 

Aden C. Magee, formerly Graduate Research Assistant in the Department of 
Animal Industry at North Carolina State College, is now Assistant Research Pro- 
fessor in Nutrition at the School of Home Economics, Woman’s College, UNC, 
Greensboro, North Carolina. 

Stanley W. Nash, Associate Professor of Mathematics at the University of 
British Columbia, has been appointed as Visiting Associate Professor in Statistics 
at Iowa State University beginning July 1, 1960. 

John Rawlings has joined the staff of the Department of Experimental Statistics 
at North Carolina State College. Dr. Rawlings recently received his Ph.D. degree 
at this institution. 

Vincent Schultz has taken the position as Ecologist with the Division of Biology 
and Medicine, U. S. Atomic Energy Commission, Washington, D. C. He was 
formerly Agricultural Statistician, University of Maryland. 

Mindel C. Sheps is Associate Research Professor of Biostatistics, Department 
of Biostatistics, Graduate School of Public Health and Associate Research Pro- 
fessor in Preventive Medicine, Department of Preventive Medicine, Medical School, 
University of Pittsburgh. He was formerly Assistant Professor of Preventive 
Medicine, Harvard Medical School. 

Ervin P. Smith, Associate Professor at Montana State College, has been ap- 
pointed as Visiting Associate Professor in Statistics at Iowa State University for 
the academic year 1959-1960. 

Robert G. D. Steel has accepted the position as Professor of Experimental 
Statistics at North Carolina State College, Raleigh, North Carolina. He was 
formerly Associate Professor of Biological Statistics at Cornell University, Ithaca, 
New York. 

Earl A. Thomas has left the Burrough Research Center to take a position with 
the Institute for Defense Analysis at McLean, Virginia. 

Peter F. Wade is now Associate Director, Management Consulting Services, 
Price Waterhouse and Company, Montreal, Quebec, Canada. He was formerly 
with the Aluminum Company of Canada, Kingston, Ontario, Canada. 

Frank Wilcoxon is now Professor of Statistics, Department of Statistics, Florida 
State University, Talahassce, Florida on a half-time basis. 

W. H. Williams, formerly Professor of Mathematics at McMaster University, 
Hamilton, Canada, has taken a position with Bell Telephone Laboratories, Murray 
Hill, New Jersey. 


W.N.A.R. 


Martha Agan is temporarily employed as a Medical Record Librarian, South 
Bay Hospital, Redondo Beach, California. 

Leo A. Aroian is now & member of the technical Staff of Space Technology 
Labs., Los Angeles. He formerly was the Senior Mathematical Consultant for 
Hughes Aircraft. 

Walter A. Becker has transferred from the Western Washington Experiment 
Station, Puyallup to the Department of Poultry Science, Washington State Univer- 
sity, Pullman, Washington. 
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Ethelyne L. McBee has left the state of Florida and is now teaching mathematics 
and science in the Tucson, Arizona public schools. ; 

William F. Dossett is now employed as a systems methodologist (human 
factors) for the Technical Military Planning Operation (TEMPO), Defense Systems 
Department, Defense Electronics Division, General Electric Co., Santa Barbara, 
California. He previously had been on a consulting basis. 

W. V. Neisius has taken a new position as Computer Sales Manager of the 
Packard Bell Computer Corporation, California. He was formerly vice-president 
of Systematics, Inc. 

William E. Reynolds has taken a new position as Chief, Research Training, 
California State Department of Public Health, Berkeley. He was formerly Professor 
of Public Health and Preventive Medicine at the University of Washington. 

Donald V. Sisson is a graduate assistant in the Department of Zoology and 
Entomology, Iowa State University transferring from hie position as Assistant 
Professor of Applied Statistics, Utah State University, Logan. 
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TABLE OF CONTENTS 
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TABLE OF CONTENTS 


Les levures sélectionées en cidrerie—Variations de certains de leurs carac- 
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On Finite Sample Distributions of GCL Identifiability Test Statistics 
Rosert L, BASMANN 


A Note on the Limiting Relative Efficiency of the Wald Sequential Prob- 


Bivariate Exponential E. J. GUMBEL 
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TECHNOMETRICS 


A Journal of Statistics for the 
Physical, Chemical and Engineering Sciences 


Vol. 2, No. 3 August, 1960 


CONTENTS 

The Compound Hypergeometric Distribution and a System of Single Sampling 
Inspection Pians Based on Prior Distributions and Costs. ..... A. Hap 
Some Remarks on the Bayesian Solution of the Single Sampling Inspection 

Serial Sampling Acceptance Schemes Derived from Bayes’s Theorem 
D. R. Cox 
Discussion of the Papers of Messrs. Hald, Wetherill and Cox. .G. A. BARNARD, 
D. V. Linney, B. F. J. ANscomss, I. J. Goon, anp G. HorsNELL 


A Semigraphical Method for the Analysis of Complex Problems. . 1°. ANDERSON 
Inter-plant Storage in Continuous Manufacturing............ H. D. Miter 
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Vol. 2, No. 4 November, 1960 

CONTENTS 

Statistical Life Test Acceptance Procedures.................... B. Epstein 


Some New Three Level Designs for the Study of Quantitative Variables 
G. E. P. Box anp D. W. BEHNKEN 
Graphical Procedure for Fitting the Best Line to a Set of Points. . .J. L. Doty 
Tables of Tolerance-Limit Factors for Normal Distrib.tion: 
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On the Evaluation of the Negative Binomial Distribution with Examples 
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On Methods of Constructing Sets of Mutually Orthogonal Latin Squares 
Using a Computer. .R. C. Bose, I. M. Cuakravarti anv D. E. Knuta 


Technometrics is published quarterly in February, May, August and 
November. To members of the American Statistical Association and the 
American Society for Quality Control the annual subscription rate is $6.00. 
The annual non-member subscription rate is $8.00. Checks should be made 
payable to Technometrics and addressed to Technometrics, Post Office Box 587, 
Benjamin Franklin Station, Washington 6, D. C. 
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INTERNATIONAL JOURNAL OF ABSTRACTS 
STATISTICAL THEORY AND METHOD 


A Journal of the International Statistical Institute 


The aim of this journal of abstracts is to give complete coverage 
of published papers in the fie!d of statistical theory (including 
associated aspects of probability and other mathematical methods) 
and new published contributions to statistical method. 


All contributions in the following five journals—being wholly 
devoted to this field—are abstracted: Annals of Mathematical 
Statistics; Biometrika; Journal, Royal Statistical Society (Series 
B); Bulletin of Mathematical Statistics; Annals, Institute of Sta- 
tistical Mathematics; and a further group of six journals are ab- 
stracted on a virtually complete basis as follows: Biometrics; 
Metrika; Metron; Review, International Statistical Institute; 
Technometrics; Sankhya. There are about 250 other journals 
partly devoted to statistical theory and method from which the 
appropriate papers are abstracted. 


The abstracts are about 400 words long—the recommendation 
of UNESCO for the “long” abstract service: they are in the Eng- 
lish language although the original language of the paper is 
noted on the abstract together with the name of abstractor. In 
addition to the address of the author(s) are given in detail to 
facilitate contact in order to obtain further detail or request an 
off-print. The journal is published quarterly and contains ap- 
proximately 1000 abstracts per year. 


A scheme of classification has been developed for the abstracts 
that is flexible and facilitates the transfer of code numbers to 
punched cards. A unique aspect of this journal is that the pages 
are colour-tinted according to the main sections of classification. 
This method of colour-coding the pages provides a distinctive 
and powerful visual aid in the identification of abstracts in what- 
ever manner the journal is filed for reference. 


Annual Subscription £5 (U.S.A. and Canada $16.00) 
Single Number 30s (U.S.A. and Canada $4.50) 


OLIVER AND BOYD LTD. 
Tweeddale Court, 14 High Street, Edinburg, 1 
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Cross-over design, 169 
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Death rate, see mortality 
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time-response curve 
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sampling, 455 
Elasticity, 430 
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see Drosophila, toxicology 
Epidemiology, 489 
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540 
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Errata, 474, 492, 695 
Error, 
correlated, 457 
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mean square, see variance 
measurement, 306 
rate, 132, 673 
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theory of, see analysis of variance, 
least squares, maximum likelihood, 
models, rejection of data 
Estimation, 651, 660, and see bias, 
complete statistics, covariance ad- 
justment, efficiency, gene fre- 
quency, information, least squares, 
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sequential, 312 
structural, 467 
truncated sample, 360 
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impossible, 367 
Evolution, 36 
Expectation, mathematical, see moments 
conditional, 543 
Experimental design, see design of ex- 
periments 
Iexponential curves, 582 
Extrapolation, 451 
Extreme deviate, 312 
Factor analysis, 29, 127, 201, 312, 315, 
490, and see communalitics, multi- 
variate analysis 
l'actor loading, 30 
Factorial experiments, 2, 696, and sce 
bioassy 
analysis of, 310, 314, 566 
confounded, 115, 691 
Veedback, 481, and see reciprocal inter- 
action 
Fiducial limits, see confidence limits 
Field experiments, 127, 313, and see 
agronomy, design of experiments, 
plot size and shape 
Finger prints, see ridge counts 
Fish, 129, 132, 261, 354, 602, and see 
ecology 
Fitting constants, see least squares 
Fitting distribution, see estimation 
Vitting regression line, see covariance, 
least. squares, regression both vari- 
ables subject to error, 606 
Forecasting, see crop 
Forestry, 399 
fourfold table, see chi-square, con- 
tingency 
F test, 662, and see analysis of variance, 
beta function 
interpretation of, sce tests 
multiple, see multiple range test 
power of, 593 
amma distribution, 250, 484, 538 
Gene frequency, 
distribution of, 69 
estimation of, 534 
rate of change, 64, 488 
Generating function, see moments, prob- 
ability 
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Genetic correlation, 235, 314, and see 
genetic covariance, heritability, path 
coefficients 

Genetic covariance, 292 

Genetic homeostasis, see homeostasis 

Genetic model, see models 
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blood groups, dominance, epistasis, 
evolution, gene frequency, genetic, 
heritability, inbreeding, incompati- 
bility, linkage, mutation, natural 
selection, overdominance, path 
analysis, polymorphism, polyploidy, 
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human, 136, 304, 534, and see ascer- 
tainment 
penetrance, 536 
population, 135, 195, 311, 314 
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estimate, sampling error of, 126 
index, 412 
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population effect, 412 
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components, 369 
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Howard count, 
Hyperbola, fitting, 606 
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Hypothesis, see models, tests 
null, 49, 177, 418, 593 
Identifiability, 
test, 477 
Inbreeding, 140, 146 
calculation of, 292 
depression, 298 
Incompatibility, 
self, 61 
Incomplete blocks, see latices 
analysis of, 566 
rank, 176 
cyclic, 567 
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cyclic, 186 
group-divisible, 183 
square; 185, 566 
triangular, 184, 567, and see in- 
complete Latin squares 
two associate classes, 570 
simple, 567 
Incomplete experiments, see missing 
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Industrial research, see chemistry, phys- 
ical science 
Infection theory, 582 
Inference, 133, and see tests 
Information, 162, 351, 454, 470, 491, 
and see analysis of variance, design 
of experiments, estimation, maxi- 
mum likelihood 
interblock, 566 
loss of, 117 
Inner product, 30 
Integral equations, 426 
Interaction, 161, and see analysis of 
variance, models 
us error term, 167 
genotype-environment, 376 
interpretation of, 134 
reciprocal, 423 
lag, 423 
Interpolation, 451, 511 
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convergence, 224 


xiii 


Jackson estimate, 358 
Judging, 86, and see en 
Karber’s method, 586 
K-statistics, see moments, 
Laboratory experiments, 698 
Lag. 423 
Lagrange multipliers, 177, 468 
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LD-50, see ED-50 
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369, 454, 467, 484, 607, 686, and 
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equations 
approximate, 609 
internal, 229 
iterative computation, 224 
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weighted, 94, 285, 369, 660 
Life tables, 315, 618, and see actuarial 
statistics 
Likelihood, 53, 176, 447, 468, 484, and 
see likelihood ratio test, maximum 
likelihood 
Limnology, see fish 
Linkage, sex-, 241 
Logistic curve, see growth, logit trans- 
formation, populations 
Lorenz polynomials, 311, and _ see 
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Main effect, see analysis of variance, 
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Markov, 218, 643 
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modified, 660 
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nents of variance, feedback, growth, 
hypothesis, missing values, path 
analysis, regression, transformations 
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tion, correlation, structural analysis 
Path coefficients, 189 
compound, 193 
sampling errors, 444 
Path regression, 189 
Penetrance, see genetics 
Perception, 281, and see organolepsis 
Periodicity, see cycles 
Pharmacology, 313, 488, and see anal- 
gesia, anesthesia, bioassay, toxi- 
cology 
Physical science, 461, and see chemistry, 
| radioactivity 
Physiology, 161, 213, 433, 607, and see 
endocrinology, medicine, threshold 
Plant spacing, 16, and see plot size and 
shape 
Plot size and shape, 375, and see plant 
spacing 
optimum, 456 
Poisson distribution, 358, 485, 489, 522, 
582 
conditional, 203 
moment ratios, 205 
truncated, 203, 446, 529 
Polykays, 272 
Polymorphism, 135 
Polyploidy, 311 
Population, see distribution, fish 
changes, 19 
dynamics, 355, 488 
management, see ecology 
model, 354, 358 
structure, 311, 354 
Power, see tests 
Precision, see information, least squares 
Prediction, 491, 496, and see crop, 
regression 
Preferences, see organolepsis 
Probability, 522 
a priori, sce prior 
a posteriori, see posterior 
generating function, 535, 621, 645 
posterior, 111 
prior, 112 


transition, 644 
Programming, see automatic computa- 
tion 
Proportions, see binomial 
Protection level, joint, 672 
Psychology, 550, and see organolepsis, 
psychomotor tests 
Public health, 308 
Quadratic forms, 44, 396 
Quadrat sampling, 486 
Quadruplets, 113 
Quality control, 339 
Quantification, see scales 
Queue, see stochastic processes 
Radioactivity, 128, 129, 132, 213, 642, 
697, and see radiology, tracers 
Radiology, 420, 506, 550 
Randomization, 41, 696, and see bias, 
selection 
rejection of unsatisfactory 
Randomized blocks, 39, and see incom- 
plete blocks, interblock error 
Random process, see stochastic 
Random net, 313 
Range, 301 
significant studentized, 672 
Rank, see transformations 
analysis of, 176 
Recurrence formulas, 513, 522 
Regression, see canonical analysis, cor- 
relation, covariance, fitting regres- 
sion line, orthogonal polynomials, 
path regression 
analysis, see analysis of covariance 
asymptotic, 125 
multiple, 224 
bilinear, see multiple 
coefficient, 
analysis of, 121, 552 
estimation of, 399 
homogeneity of, 593 
curvilinear, 401, 604 
test of, 462 
equation, reduced, 465 
external, 8 
heteroscedastic, 399 
intercept, 593 
internal, 8 
model, see structural 
multiple, 200, 285, 458 
missing values, 131 
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parent-offspring, 34% 
partial, 192 
polynomial, 231 
through origin, 453 
two error components, 441 
Rejection of data, 132, and see missing 
values, selection 
Repeatability, see genetic correlation, 
heritability 
teplication, 580 
Residual, see error 
normal, 608 
Response, see bioassay, dose-response, 
model, time-response 
quantal, 582 
correlated, 491 
multiple, 382 
172 
sequential, 127 
versus graded, 162 
surface, 168 
eviews, 304, 308, 486, 696 
Ridge counts, 110 
Sample size needed, 636, and see optional 
stopping, organolepsis 
Sampling, 128, 261, and see components 
of variance, design of experiments, 
sample size needed 
ecological, 51 
error, variance 
minimax, 341 
preliminary, 339 
stratified, 262 
studies of statistical problems, 472, 
and see Monte Carlo 
Seales, 87 
Scores, see discriminant function, organ- 
olepsis, Tanks, seales 
Sereening tests, see bioassay 
Selection, sce choice of transformation, 
design of experiments, genetic selec- 
tion, organolepsis, randomization, 
rejection of data, sampling 
natural, 38, 61, 135, and see com- 
petition 
of data, 3 
of error term, see proper error term 
of experimental units, 29 
of model, see model 
of variates, 312 
Sensitivity, 169, 392 
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data, see quantal response 
Sensory tests, sce organolepsis 
Serology, see blood groups, hematology 
Sex ratio, 23 
Sign test, 86 
Simultaneous equations, see matrix 
Skewness, 251 
Smoothing, sce curve fitting 
Sociology, see demography 
Split plots, 451, and sce covariance 
adjustment 
Standard deviation, see variance 
Standard error, J, and see variance 
Statistical control, see analysis of 
variance 
Statistical methods, research needed, 394 
Statistics texts and periodicals, 308 
Stecpest ascent (descent), method of, 490 
Stirling approximation, see gamma func- 
tion 
Stirling numbers, 525 
Stochastic processes, 486, 489, 618 
Stochastic state, 231 
Stop rule, see optional stopping 
Structural analysis, 311, 464, 481 
Student’s t, ece test 
Subjective evaluation, sec 
organolepsis 
Successive approximation, see iteration 
Sufficient statistics, see estimation 
Survival curve, sce dose-response curve, 
mortality, time-response curve 
Survival time, see time-response curve 
Systematic designs, see order statistics 
‘Tables, miscellaneous, 54, 205, 207, 531, 
638, 675 
graphical, 206, 208 
Target theory, see radiology 
‘Taste tests, see organolepsis 
‘Taste threshold, 41, 245 
‘Taxonomy, 29, 489, and sec morphology 
Taylor series, 359 
‘Tests, see analysis of variance, chi- 
square, comparisons, confidence 


judging, 


limits, F test, goodness of fit, 
likelihood ratio, null hypothesis, 
order, protection level, ranks, re- 
jection of data, sign test 

exact, 44 

Ifotelling’s T, 42 

likelihood ratio, 177, 255, 556 
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multiple range, Duncan’s, 133, 134, 
553, 671 
multivariate, 41 
of model, 179 
of number of dimensions, 29 
of significance, 166 
of largest difference, see multiple 
range test 
permutation, 50 
power of, 86, 583 
psychomotor, 550 
randomization, 49 
sensitivity of, 46 
using range, see multiple range test 
Theory, see biometry, hypothesis, mode! 
Threshold, 87 
Time-response curve, 132, 163, and see 
dose-response curve 
Tolerance, 397, 488 
Toxicology, .382, and see bioassay, 
pharmacology 
Tracers, 212, 642, and see radioactivity 
Transformations, 167, 382, 659, and see 
variance 


additivity, analysis of 


bioussay, canonical analysis, dis- 
criminant function, matrix, model, 
angular, 107, 167 
choice of, 48 
homoscedastic, 340 
inverse sine, see angular 
logit, 167, 386 
multivalued, 664 
multivariate, 42 
normit, 384 
probability, 659 
probit, 167 
square root, 486 
to linearize regression, 607 
Tree crops, see crop, horticulture 
‘Trend, see regression 
Tribolium, 19 
Triplets, 112 
T test, see confidence limits, Hotell- 
ing’s T 
Tuberculosis, 308 
Tumor, 584, and see cancer 
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Twins, 110, 304, and see zygosity 
diagnosis 
Variance, see covariance, F test, genetic 
analysis of, 117, 120, 121, 462, 552, 
568, 690, and see additivity, 
analysis of covariance, chi-square, 
components of variance, degrees 
of freedom, error, F test, inter- 
action, least squares, missing 
values, model, multiple /’ test, 
multivariate analysis, orthogonal 
polynomials, path coefficients, 
regression, structural analysis, 
tests, transformations 
computation of, 164 
hierarchical, 136 
interpretation of, 133, 134 
asymptotic, 362, 519, 585 
components, 267, 314, 5538, ard see 
components of covariance, struc- 
tural analysis 
estimated, 301 
heterogeneity of, 94 
homogeneity of, multivariate analogue, 
548 
interblock, see information 
matrix, see covariance matrix 
of difference of adjusted means, 15 
of estimate, 106, 357, 570 
of proportion, 180 
of ratio, 519 
of regression coefficient, 348 
of threshold, 256 
of variance, 369 
ratio, see beta function, F test 
sampling, 265 
theoretical, 174 
Vector, 236, 272 
Virology, 126, 582 
Vital statistics, 308, and see actuarial 
statistics, demography 
Weighting, 170, 282, 375, 384, 401, 484, 
569, 593 
bias, 172 
genetic, 235 
Z test, see F test 
Zygosity diagnosis, 110, 305 
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